Concepts
Providers

Providers

Quantized routes each request to a provider based on the endpoint and your configuration. You don’t need to manage separate API keys or accounts for each provider.

Supported providers

Provider	Slug	Capabilities
OpenRouter	`openrouter`	Chat completions, Responses, Models, Embeddings
OpenAI Direct	`openai`	Embeddings, Image generation (DALL-E 2/3, gpt-image-1)
Anthropic	`anthropic`	Chat completions, Models
AWS Bedrock	`bedrock`	Chat completions, Responses, Bedrock-native embeddings, Image generation (Titan / Nova Canvas / Stability)
Google Gemini	`gemini`	Gemini-native embeddings, Image generation (Imagen 4, Gemini Flash Image)
Exa	`exa`	Web search, Content fetch
Tavily	`tavily`	Web search, Content fetch

Default routing

Each capability has a default provider:

Capability	Default Provider
Chat completions	OpenRouter
Responses	OpenRouter
Models	OpenRouter
Embeddings (`/v1/embeddings`)	OpenAI Direct
Bedrock-native embeddings (`/v1/aws-bedrock/embeddings`)	AWS Bedrock
Gemini-native embeddings (`/v1/gemini/embeddings`)	Google Gemini
Image generation (`/v1/images/generations`)	OpenAI Direct (resolved per-model)
Web search	Exa
Content fetch	Exa

Choosing a provider

Use the X-Quantized-Provider header to override the default:

# Use Anthropic directly instead of OpenRouter
curl -X POST https://api.quantized.us/v1/chat/completions \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "X-Quantized-Provider: anthropic" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Use Tavily instead of Exa for web search
curl -X POST https://api.quantized.us/v1/web-search \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "X-Quantized-Provider: tavily" \
  -H "Content-Type: application/json" \
  -d '{"query": "latest AI news"}'

Model naming

All models use the author/model format (e.g., openai/gpt-4.1-mini, anthropic/claude-sonnet-4). Use the Models endpoint to list available model IDs.

Capability matrix

Endpoint	OpenAI	OpenRouter	Anthropic	Bedrock	Gemini	Exa	Tavily
`POST /v1/chat/completions`	—	Yes (default)	Yes	Yes	—	—	—
`POST /v1/responses`	—	Yes (default)	—	Yes	—	—	—
`POST /v1/embeddings`	Yes (default)	Yes	—	—	—	—	—
`POST /v1/aws-bedrock/embeddings`	—	—	—	Yes (default, only)	—	—	—
`POST /v1/gemini/embeddings`	—	—	—	—	Yes (default, only)	—	—
`POST /v1/images/generations`	Yes (default)	—	—	Yes — Titan / Nova / Stability (inactive, pending ops)	Yes — Imagen 4 / Flash Image (inactive, pending paid tier)	—	—
`GET /v1/models`	—	Yes (default)	Yes	—	—	—	—
`POST /v1/web-search`	—	—	—	—	—	Yes (default)	Yes
`POST /v1/fetch`	—	—	—	—	—	Yes (default)	Yes

Chat-completions modalities

Not every provider accepts every content part on POST /v1/chat/completions. Requests are additionally gated by the target model’s declared modalities — see the Models endpoint for per-model input_modality flags.

Content part	OpenRouter	Anthropic	Bedrock
`text`	Yes	Yes	Yes
`image_url`	Yes	Yes	—
`input_audio`	Yes	—	—
`video_url`	Yes	—	—
`file` (PDF)	Yes (universal — works on all models via OpenRouter’s PDF parser)	—	—

Sending an unsupported modality returns 400 with a descriptive error message (e.g. "Model 'openai/gpt-4.1-nano' does not support audio input") before the request reaches the provider.

Anthropic-specific behavior

When routing through X-Quantized-Provider: anthropic, the router adapts OpenAI-shaped requests to Anthropic’s native /v1/messages format:

`response_format` — JSON output normalization

Anthropic’s API does not natively support the response_format parameter. The router emulates it by injecting a system-prompt instruction telling the model to return raw JSON. Some Claude models (notably Claude Haiku 4.5) still wrap their output in ```json ... ``` markdown fences despite this instruction.

To uphold the response_format contract — “callers asking for JSON get parseable JSON” — the router strips a single wrapping markdown fence from the response content when:

the request specified response_format: {"type": "json_object" | "json_schema"}, and
the response content is wrapped entirely in ```json ... ``` or ``` ... ``` (a fence embedded inside prose is not stripped).

This stripping is only applied to the Anthropic provider path — OpenRouter responses are forwarded as-is because OpenRouter handles response_format server-side.

Out-of-scope content parts

Anthropic’s chat endpoint currently receives only text and image_url content parts from the router. Requests containing input_audio, video_url, or file parts are accepted by the router’s serializer but would fail upstream if routed to Anthropic. Use OpenRouter (the default for chat completions) for these modalities.

OpenAI Direct (Embeddings)

OpenAI Direct is the default provider for POST /v1/embeddings. It calls OpenAI’s /v1/embeddings endpoint with Quantized’s pooled OPENAI_API_KEY. Clients don’t need their own OpenAI account — billing is unified through Quantized’s per-key credit balance.

Models in scope

Model id	Native dimension	Public list rate
`text-embedding-3-small`	1536	$0.02 / 1M tokens
`text-embedding-3-large`	3072	$0.13 / 1M tokens
`text-embedding-ada-002`	1536	$0.10 / 1M tokens

Unknown OpenAI model ids fall back to a conservative default rate so the router never bills at $0 on a misconfigured request.

OpenRouter passthrough

X-Quantized-Provider: openrouter routes the same request through OpenRouter, which exposes OpenAI’s embedding models with an openai/ prefix. Because OpenRouter’s embedding response does not include a per-call cost, the router falls back to the OpenAI rate table after stripping the prefix.

Bedrock-specific behavior

When routing through X-Quantized-Provider: bedrock, the router calls AWS Bedrock’s Converse API on Quantized’s AWS account. Clients don’t need their own AWS credentials — billing is unified through Quantized’s per-key credit balance.

Model resolution

Only models with a bedrock row in the model catalog are routable through this header. The catalog row’s model_name field carries the full Bedrock model id (e.g., amazon.nova-micro-v1:0); clients always reference the canonical Quantized id (e.g., amazon/nova-micro).

If a model has no Bedrock row, requests with X-Quantized-Provider: bedrock fail at resolution with 400 Provider 'bedrock' not available for model '<id>' before reaching AWS. Use GET /v1/models and look for providers[].provider == "bedrock" to list eligible models.

Model access gates

Some Bedrock model families require AWS-side approval before they can be invoked, even when the catalog has a bedrock row for them:

Family	Catalog `model_name` prefix	AWS-side approval
Amazon Nova	`amazon.nova-*`	None — invokable immediately
Meta Llama, Mistral, Cohere	`meta.`, `mistral.`, `cohere.*`	One-line click-through, instant
Anthropic Claude	`anthropic.claude-*`	Use-case form (5 fields, manual approval)

If the AWS account behind the router lacks access for a model, you’ll see a 404 with a message like "Model use case details have not been submitted for this account..." — that’s AWS, not the router. The fix is operator-side: enable the model in the AWS Console under Bedrock → Model access for the region. Until that’s done, route the same call to OpenRouter (the default) or pick a Nova model — Amazon’s own family has no gating.

Request translation

OpenAI field	Bedrock Converse field
`messages` (with `role: "system"`)	Split — system text becomes top-level `system: [{"text": "..."}]`; user/assistant stay in `messages`
`max_tokens` / `max_completion_tokens`	`inferenceConfig.maxTokens`
`temperature`	`inferenceConfig.temperature`
`top_p`	`inferenceConfig.topP`
`stop` (string or array)	`inferenceConfig.stopSequences` (always a list — must be non-whitespace, see below)
`tools` + `tool_choice`	`toolConfig.tools` (`toolSpec`) + `toolConfig.toolChoice`
Assistant messages with `tool_calls[]`	Assistant content with `toolUse` blocks
`role: "tool"` (with `tool_call_id`)	User content with a `toolResult` block

Stop-sequence quirk

Bedrock rejects whitespace-only stop sequences with 400 The stop sequence value at inferenceConfig.stopSequences.0 is blank. Other providers (OpenRouter, Anthropic native) accept them. If you need a request body that works across all providers, use printable stop sequences such as "###", "END", or "---" instead of "\n\n".

Out-of-scope today

The following are accepted by the router but are not forwarded to Bedrock — they will produce unexpected behavior or no-ops on this provider path. Use OpenRouter (the default for chat completions) when you need them:

Streaming (stream: true) — Bedrock provider does not implement streaming; requests will hang or error
Vision / multimodal content parts (image_url, input_audio, video_url, file) — even when the underlying model supports them
response_format — JSON-mode emulation is not implemented for Bedrock; the parameter is silently dropped
Reasoning (reasoning.effort) — Claude extended thinking via Bedrock is not yet wired through
frequency_penalty / presence_penalty / repetition_penalty / seed / logprobs / top_logprobs / logit_bias — Bedrock’s Converse API doesn’t accept them; silently dropped

Bedrock-native Embeddings

POST /v1/aws-bedrock/embeddings is a native-shape passthrough — distinct from the OpenAI-compatible /v1/embeddings. It uses bedrock-runtime.invoke_model (NOT Converse — embedding models don’t speak Converse) and preserves Bedrock’s request/response shape byte-for-byte. See the full reference at AWS Bedrock Embeddings.

Models in scope

Model id	Vendor	Native dimension	Public list rate
`amazon.titan-embed-text-v2:0`	Amazon Titan	256, 512, 1024	$0.02 / 1M tokens
`cohere.embed-english-v3`	Cohere	1024	$0.10 / 1M tokens
`cohere.embed-multilingual-v3`	Cohere	1024	$0.10 / 1M tokens

The endpoint accepts two distinct request bodies discriminated by the model prefix:

amazon.titan-* → { model, inputText, dimensions?, normalize?, embeddingTypes? }
cohere.* → { model, texts, input_type, embedding_types?, truncate? }

Mismatching the body shape and the model prefix (e.g. Cohere fields on a Titan model id) is rejected with 422 before reaching upstream.

Token estimation for Cohere

Cohere’s response does not include a token count. The router estimates input tokens at ~4 chars per token (floored at 1) — conservative and rarely under-bills natural-language input. Titan returns inputTextTokenCount directly and is billed against the upstream count.

Google Gemini (Embeddings)

POST /v1/gemini/embeddings is a native-shape passthrough to Google’s generativelanguage.googleapis.com/v1beta endpoints. Clients don’t need their own Gemini API key — billing is unified through Quantized’s per-key credit balance. See the full reference at Gemini Embeddings.

Single vs batch routing

The router picks the upstream endpoint based on the cardinality of contents:

1 content → :embedContent ($0.15 per 1M tokens)
N > 1 contents → :batchEmbedContents ($0.075 per 1M tokens — half-priced)

The endpoint field in the response confirms which upstream URL was used.

The silent-concatenation trap

Sending multi-part content.parts to :embedContent (Gemini’s single-content endpoint) makes Gemini silently concatenate the parts into one string and return ONE vector for the concatenation — no error, 200 OK, wrong shape. The router always dispatches multi-content requests to :batchEmbedContents to avoid this. Treat any unexpected endpoint value as a router bug.

Models in scope

Model id	Native dimension	Truncatable to
`gemini-embedding-001`	3072	768

Token estimation

Gemini’s embedding endpoints do not return token counts. Same heuristic as Cohere (~4 chars/token, floored at 1).

Image Generation

POST /v1/images/generations is a unified endpoint — there are no native passthroughs (no /v1/aws-bedrock/images/generations, no /v1/gemini/images/generations). All providers adapt to the same OpenAI-shape request/response.

Provider matrix

Provider	Models	Native body shape	Status
OpenAI Direct	`dall-e-2`, `dall-e-3`, `gpt-image-1`	OpenAI `/v1/images/generations`	Active
AWS Bedrock	`amazon.titan-image-generator-v2:0`, `amazon.nova-canvas-v1:0`	`{ taskType, textToImageParams, imageGenerationConfig }`	Catalog seeded, inactive (re-enable in AWS Console → Bedrock → Model access)
AWS Bedrock — Stability	`stability.stable-image-{core,ultra}-v1:0`, `stability.sd3-5-large-v1:0`	`{ prompt, aspect_ratio, output_format, seed?, negative_prompt? }`	Catalog seeded, inactive (region-restricted to `us-west-2`)
Google Gemini — Imagen	`imagen-4.0-{fast-generate,generate,ultra-generate}-001`	`:predict` with `instances` + `parameters`	Catalog seeded, inactive (paid-tier Gemini API project required)
Google Gemini — Flash Image	`gemini-2.5-flash-image`	`:generateContent` with `responseModalities: [TEXT, IMAGE]`	Catalog seeded, inactive

Output transport

Forced to b64_json for every provider. Read images from data[].b64_json. There is no data[].url field — DALL-E URLs expire in ~60 minutes and would need a CDN rehost subsystem.

Provider-specific field handling

Field	DALL-E 2	DALL-E 3	gpt-image-1	Bedrock Titan/Nova	Bedrock Stability	Imagen	Gemini Flash Image
`prompt`	Yes	Yes (auto-rewritten)	Yes	Yes	Yes	Yes	Yes (chat-style)
`n`	1–10	1 only	1 only	1–5	1 only	1–4	1 only
`size`	`256x256`, `512x512`, `1024x1024`	`1024x1024`, `1024x1792`, `1792x1024`	up to 3840px	`WxH` (varies per model)	mapped to nearest `aspect_ratio`	mapped to `aspectRatio` + `imageSize` (1K / 2K)	chat-style (no size knob)
`quality`	—	`standard`, `hd`	`low`, `medium`, `high`, `auto`	`standard`, `premium`	—	—	—
`style`	(ignored)	`vivid`, `natural`	(stripped)	(stripped)	(stripped)	(stripped)	(stripped)
`background`	(stripped)	(stripped)	`transparent`, `opaque`, `auto`	(stripped)	(stripped)	(stripped)	(stripped)
`output_format`	(stripped)	(stripped)	`png`, `jpeg`, `webp`	(stripped)	`png`, `jpeg`, `webp`	(stripped)	(stripped)
`seed`	(stripped)	(stripped)	(stripped)	Yes	Yes	(stripped)	(stripped)
`negative_prompt`	(stripped)	(stripped)	(stripped)	`negativeText`	`negative_prompt`	(stripped)	(stripped)

Pricing

Model	Pricing model	Source
`dall-e-2`	$0.016 / $0.018 / $0.020 per image (by size)	OpenAI public list
`dall-e-3`	$0.040 to $0.120 per image (by size × quality)	OpenAI public list
`gpt-image-1`	Token-priced — $5/M input + $40/M output tokens	OpenAI public list
Bedrock Titan	$0.008 to $0.014 per image (by size × quality)	AWS Bedrock public list
Bedrock Nova Canvas	$0.040 to $0.080 per image (by size × quality)	AWS Bedrock public list
Bedrock Stability	$0.030 / $0.065 / $0.080 per image (flat per model)	AWS Bedrock public list
Imagen 4 Fast / Standard / Ultra	$0.020 / $0.040 / $0.060 per image (flat per tier)	Google AI Studio public list
Gemini 2.5 Flash Image	~$0.039 per 1024² image (token-priced)	Google AI Studio public list

The router computes the per-call cost in the provider code (see Errors for how a $0 cost is recorded when upstream returns an error).

Watermarking

The unified response includes a watermark enum on each data[] entry:

c2pa — gpt-image-1 (always)
provenance — Amazon Titan, Nova Canvas (always)
synthid — Google Imagen, Gemini Flash Image (always)
none — DALL-E 2, DALL-E 3, Bedrock Stability

Consumers targeting education customers should consider rendering a disclosure when watermark != "none".

Content moderation

Bedrock Titan/Nova return 200 OK with an empty images array and an inline error string when content moderation triggers — they do not return a 4xx. The router surfaces this as a successful response with data: [{flagged: true, ...}] and usage.images: 0. Billing is $0 for moderation-blocked generations.

DALL-E 2/3, gpt-image-1, Stability, and Imagen all return standard error responses on moderation blocks (400 with the upstream message), which the router maps to its standard error hierarchy.

Provider errors

If the upstream provider fails (timeout, rate limit, authentication error), Quantized returns a 503 with a generic message:

{
  "error": {
    "message": "Service temporarily unavailable"
  }
}

Internal provider errors are masked to avoid leaking infrastructure details. See Errors for the full error reference.

Providers

Supported providers

Default routing

Choosing a provider

Capability matrix

Chat-completions modalities

Anthropic-specific behavior

response_format — JSON output normalization

Out-of-scope content parts

OpenAI Direct (Embeddings)

Models in scope

OpenRouter passthrough

Bedrock-specific behavior

Model resolution

Model access gates

Request translation

Stop-sequence quirk

Out-of-scope today

Bedrock-native Embeddings

Models in scope

Token estimation for Cohere

Google Gemini (Embeddings)

Single vs batch routing

Models in scope

Token estimation

Image Generation

Provider matrix

Output transport

Provider-specific field handling

Pricing

Watermarking

Content moderation

Provider errors

On This Page

`response_format` — JSON output normalization