API Reference
AWS Bedrock Embeddings

AWS Bedrock Embeddings

POST /v1/aws-bedrock/embeddings

Native passthrough to AWS Bedrock embedding models. Request and response shapes match Bedrock’s InvokeModel API byte-for-byte, with a usage block layered on top for credit tracking.

When to use this endpoint vs /v1/embeddings

Use /v1/embeddings when you want a unified, OpenAI-compatible shape. Use this endpoint when you need Bedrock-specific fields (normalize, embeddingTypes, Cohere’s input_type and embedding_types, quantized output) that the OpenAI shape doesn’t expose.

Headers

Header	Required	Description
`Authorization`	Yes	`Bearer <api-key-or-jwt>`
`Content-Type`	Yes	`application/json`

X-Quantized-Provider is not used for this endpoint — the provider is implicit (Bedrock).

Request body

The body shape varies by model. The router discriminates on the model field’s prefix:

amazon.titan-* → Titan schema
cohere.* → Cohere schema

Any other prefix is rejected with 422 before the catalog gate runs.

Titan (`amazon.titan-embed-text-v2:0`)

Field	Type	Required	Default	Description
`model`	string	Yes	—	Must start with `amazon.titan-`
`inputText`	string	Yes	—	Single string input. Titan is single-input-only — pass one string per call
`dimensions`	integer	No	1024	One of 256, 512, 1024
`normalize`	boolean	No	true	Return a unit vector when true
`embeddingTypes`	array	No	`["float"]`	Subset of `"float"`, `"binary"`

Cohere (`cohere.embed-english-v3`, `cohere.embed-multilingual-v3`)

Field	Type	Required	Default	Description
`model`	string	Yes	—	Must start with `cohere.`
`texts`	array of strings	Yes	—	One or more inputs. Cohere handles batching natively
`input_type`	string	Yes	—	One of `search_document`, `search_query`, `classification`, `clustering`
`embedding_types`	array	No	`["float"]`	Subset of `"float"`, `"int8"`, `"uint8"`, `"binary"`, `"ubinary"`
`truncate`	string	No	`"NONE"`	One of `"NONE"`, `"START"`, `"END"`

Strict validation

Both schemas use extra="forbid" — any field not listed above is rejected with 422. This prevents OpenAI-style fields (user, encoding_format, dimensions on Cohere) from being silently dropped or forwarded.

Examples

cURL — Titan

cURL — Cohere (batch)

Python (httpx)

curl -X POST https://api.quantized.us/v1/aws-bedrock/embeddings \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.titan-embed-text-v2:0",
    "inputText": "The quick brown fox jumps over the lazy dog.",
    "dimensions": 512,
    "normalize": true,
    "embeddingTypes": ["float"]
  }'

curl -X POST https://api.quantized.us/v1/aws-bedrock/embeddings \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cohere.embed-english-v3",
    "texts": ["first doc", "second doc", "third doc"],
    "input_type": "classification",
    "embedding_types": ["float"]
  }'

import httpx

# Titan
resp = httpx.post(
    "https://api.quantized.us/v1/aws-bedrock/embeddings",
    headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
    json={
        "model": "amazon.titan-embed-text-v2:0",
        "inputText": "Hello world",
        "dimensions": 256,
    },
)
data = resp.json()
print(len(data["embedding"]))  # 256

# Cohere — multilingual, batch
resp = httpx.post(
    "https://api.quantized.us/v1/aws-bedrock/embeddings",
    headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
    json={
        "model": "cohere.embed-multilingual-v3",
        "texts": ["Hello", "Hola", "Bonjour"],
        "input_type": "search_document",
    },
)
data = resp.json()
print(len(data["embeddings"]["float"]))  # 3

Response

Titan response

{
  "embedding": [0.0123, -0.0456, 0.0789, ...],
  "embeddingsByType": {
    "float": [0.0123, -0.0456, 0.0789, ...]
  },
  "inputTextTokenCount": 11,
  "usage": {
    "credits_used": 3,
    "credits_remaining": 999997
  }
}

Field	Type	Description
`embedding`	array of floats or `null`	Top-level float vector. `null` when only non-float `embeddingTypes` were requested
`embeddingsByType`	object or `null`	Per-type vectors when `embeddingTypes` was specified
`inputTextTokenCount`	integer	Provider-reported token count for `inputText`
`usage.credits_used`	integer	Micro-credits consumed
`usage.credits_remaining`	integer or null	Micro-credits remaining (`null` for unlimited licenses)

Cohere response

Cohere v3 returns two distinct response shapes depending on whether the request included embedding_types. The router preserves both byte-for-byte.

With embedding_types (by-type dict):

{
  "id": "abc-123",
  "texts": ["first doc", "second doc"],
  "embeddings": {
    "float": [[0.0123, ...], [0.0456, ...]]
  },
  "response_type": "embeddings_by_type",
  "usage": {
    "credits_used": 2,
    "credits_remaining": 999998
  }
}

Without embedding_types (legacy list):

{
  "id": "abc-123",
  "texts": ["first doc"],
  "embeddings": [
    [0.0123, ...]
  ],
  "response_type": "embeddings_floats",
  "usage": {
    "credits_used": 1,
    "credits_remaining": 999999
  }
}

Field	Type	Description
`id`	string	Cohere request id
`texts`	array	Input texts echoed back, in order
`embeddings`	object or array	Object keyed by `embedding_types` (e.g. `"float"`, `"int8"`) when the request set `embedding_types`; otherwise a flat array of float vectors
`response_type`	string	`"embeddings_by_type"` when `embedding_types` was set; `"embeddings_floats"` otherwise

Pick one shape and stick with it

Always include embedding_types: ["float"] in requests if you want the predictable by-type dict shape. The legacy list shape only appears when embedding_types is omitted — useful for backwards compatibility with older client code, but the by-type shape is recommended for new integrations.

Cohere token estimation

Cohere’s response does not include a token count. Quantized estimates input tokens at ~4 characters per token (floored at 1). This is conservative and rarely under-bills for natural-language input.

Models

Model id	Vendor	Dimensions	Batching	Public list rate
`amazon.titan-embed-text-v2:0`	Amazon Titan	256, 512, 1024	Single-input only	$0.02 / 1M tokens
`cohere.embed-english-v3`	Cohere	1024	Native batch	$0.10 / 1M tokens
`cohere.embed-multilingual-v3`	Cohere	1024	Native batch	$0.10 / 1M tokens

All three are seeded with supported_features: ["bedrock_embeddings"] in GET /v1/models. Filter for Bedrock embedding models:

bedrock_embed_models = [
    m for m in models
    if "bedrock_embeddings" in m.get("supported_features", [])
]

Providers

Provider	Slug	Default?
AWS Bedrock	`bedrock`	Yes (and only)

This endpoint is Bedrock-only — X-Quantized-Provider is ignored. If you want to route OpenAI embedding models through Bedrock-style infrastructure, that’s not supported; use /v1/embeddings instead.

Errors

Status	Condition
`401`	Invalid or missing API key
`402`	Insufficient credits
`404`	Model id is Bedrock-prefixed but not seeded in the catalog
`422`	Validation error — wrong field shape for the discriminated vendor; missing `inputText` (Titan) or `texts`/`input_type` (Cohere); unknown field; bad model prefix
`400`	Upstream Bedrock `ValidationException` (e.g. unsupported `dimensions` value); sanitized message returned
`503`	Upstream throttling, transient unavailability, or auth failure on the AWS credential

Out of scope today

The following are not accepted by v1 and may be added in a future release:

Cohere v4 multimodal (cohere.embed-multimodal-v4)
Titan multimodal embedding (amazon.titan-embed-image-v1)
Cross-region routing (the AWS region is fixed via env)