AWS Bedrock Embeddings

POST /v1/aws-bedrock/embeddings

Native passthrough to AWS Bedrock embedding models. Request and response shapes match Bedrock’s InvokeModel API byte-for-byte, with a usage block layered on top for credit tracking.

When to use this endpoint vs /v1/embeddings

Use /v1/embeddings when you want a unified, OpenAI-compatible shape. Use this endpoint when you need Bedrock-specific fields (normalize, embeddingTypes, Cohere’s input_type and embedding_types, quantized output) that the OpenAI shape doesn’t expose.

Headers

Header Required Description
Authorization Yes Bearer <api-key-or-jwt>
Content-Type Yes application/json

X-Quantized-Provider is not used for this endpoint — the provider is implicit (Bedrock).

Request body

The body shape varies by model. The router discriminates on the model field’s prefix:

  • amazon.titan-* → Titan schema
  • cohere.* → Cohere schema

Any other prefix is rejected with 422 before the catalog gate runs.

Titan (amazon.titan-embed-text-v2:0)

Field Type Required Default Description
model string Yes Must start with amazon.titan-
inputText string Yes Single string input. Titan is single-input-only — pass one string per call
dimensions integer No 1024 One of 256, 512, 1024
normalize boolean No true Return a unit vector when true
embeddingTypes array No ["float"] Subset of "float", "binary"

Cohere (cohere.embed-english-v3, cohere.embed-multilingual-v3)

Field Type Required Default Description
model string Yes Must start with cohere.
texts array of strings Yes One or more inputs. Cohere handles batching natively
input_type string Yes One of search_document, search_query, classification, clustering
embedding_types array No ["float"] Subset of "float", "int8", "uint8", "binary", "ubinary"
truncate string No "NONE" One of "NONE", "START", "END"
Strict validation

Both schemas use extra="forbid" — any field not listed above is rejected with 422. This prevents OpenAI-style fields (user, encoding_format, dimensions on Cohere) from being silently dropped or forwarded.

Examples

cURL — Titan
cURL — Cohere (batch)
Python (httpx)
curl -X POST https://api.quantized.us/v1/aws-bedrock/embeddings \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.titan-embed-text-v2:0",
    "inputText": "The quick brown fox jumps over the lazy dog.",
    "dimensions": 512,
    "normalize": true,
    "embeddingTypes": ["float"]
  }'
curl -X POST https://api.quantized.us/v1/aws-bedrock/embeddings \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cohere.embed-english-v3",
    "texts": ["first doc", "second doc", "third doc"],
    "input_type": "classification",
    "embedding_types": ["float"]
  }'
import httpx

# Titan
resp = httpx.post(
    "https://api.quantized.us/v1/aws-bedrock/embeddings",
    headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
    json={
        "model": "amazon.titan-embed-text-v2:0",
        "inputText": "Hello world",
        "dimensions": 256,
    },
)
data = resp.json()
print(len(data["embedding"]))  # 256

# Cohere — multilingual, batch
resp = httpx.post(
    "https://api.quantized.us/v1/aws-bedrock/embeddings",
    headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
    json={
        "model": "cohere.embed-multilingual-v3",
        "texts": ["Hello", "Hola", "Bonjour"],
        "input_type": "search_document",
    },
)
data = resp.json()
print(len(data["embeddings"]["float"]))  # 3

Response

Titan response

{
  "embedding": [0.0123, -0.0456, 0.0789, ...],
  "embeddingsByType": {
    "float": [0.0123, -0.0456, 0.0789, ...]
  },
  "inputTextTokenCount": 11,
  "usage": {
    "credits_used": 3,
    "credits_remaining": 999997
  }
}
Field Type Description
embedding array of floats or null Top-level float vector. null when only non-float embeddingTypes were requested
embeddingsByType object or null Per-type vectors when embeddingTypes was specified
inputTextTokenCount integer Provider-reported token count for inputText
usage.credits_used integer Micro-credits consumed
usage.credits_remaining integer or null Micro-credits remaining (null for unlimited licenses)

Cohere response

Cohere v3 returns two distinct response shapes depending on whether the request included embedding_types. The router preserves both byte-for-byte.

With embedding_types (by-type dict):

{
  "id": "abc-123",
  "texts": ["first doc", "second doc"],
  "embeddings": {
    "float": [[0.0123, ...], [0.0456, ...]]
  },
  "response_type": "embeddings_by_type",
  "usage": {
    "credits_used": 2,
    "credits_remaining": 999998
  }
}

Without embedding_types (legacy list):

{
  "id": "abc-123",
  "texts": ["first doc"],
  "embeddings": [
    [0.0123, ...]
  ],
  "response_type": "embeddings_floats",
  "usage": {
    "credits_used": 1,
    "credits_remaining": 999999
  }
}
Field Type Description
id string Cohere request id
texts array Input texts echoed back, in order
embeddings object or array Object keyed by embedding_types (e.g. "float", "int8") when the request set embedding_types; otherwise a flat array of float vectors
response_type string "embeddings_by_type" when embedding_types was set; "embeddings_floats" otherwise
Pick one shape and stick with it

Always include embedding_types: ["float"] in requests if you want the predictable by-type dict shape. The legacy list shape only appears when embedding_types is omitted — useful for backwards compatibility with older client code, but the by-type shape is recommended for new integrations.

Cohere token estimation

Cohere’s response does not include a token count. Quantized estimates input tokens at ~4 characters per token (floored at 1). This is conservative and rarely under-bills for natural-language input.

Models

Model id Vendor Dimensions Batching Public list rate
amazon.titan-embed-text-v2:0 Amazon Titan 256, 512, 1024 Single-input only $0.02 / 1M tokens
cohere.embed-english-v3 Cohere 1024 Native batch $0.10 / 1M tokens
cohere.embed-multilingual-v3 Cohere 1024 Native batch $0.10 / 1M tokens

All three are seeded with supported_features: ["bedrock_embeddings"] in GET /v1/models. Filter for Bedrock embedding models:

bedrock_embed_models = [
    m for m in models
    if "bedrock_embeddings" in m.get("supported_features", [])
]

Providers

Provider Slug Default?
AWS Bedrock bedrock Yes (and only)

This endpoint is Bedrock-only — X-Quantized-Provider is ignored. If you want to route OpenAI embedding models through Bedrock-style infrastructure, that’s not supported; use /v1/embeddings instead.

Errors

Status Condition
401 Invalid or missing API key
402 Insufficient credits
404 Model id is Bedrock-prefixed but not seeded in the catalog
422 Validation error — wrong field shape for the discriminated vendor; missing inputText (Titan) or texts/input_type (Cohere); unknown field; bad model prefix
400 Upstream Bedrock ValidationException (e.g. unsupported dimensions value); sanitized message returned
503 Upstream throttling, transient unavailability, or auth failure on the AWS credential
Out of scope today

The following are not accepted by v1 and may be added in a future release:

  • Cohere v4 multimodal (cohere.embed-multimodal-v4)
  • Titan multimodal embedding (amazon.titan-embed-image-v1)
  • Cross-region routing (the AWS region is fixed via env)