Providers

Quantized routes each request to a provider based on the endpoint and your configuration. You don’t need to manage separate API keys or accounts for each provider.

Supported providers

Provider Slug Capabilities
OpenRouter openrouter Chat completions, Responses, Models, Embeddings
OpenAI Direct openai Embeddings, Image generation (DALL-E 2/3, gpt-image-1)
Anthropic anthropic Chat completions, Models
AWS Bedrock bedrock Chat completions, Responses, Bedrock-native embeddings, Image generation (Titan / Nova Canvas / Stability)
Google Gemini gemini Gemini-native embeddings, Image generation (Imagen 4, Gemini Flash Image)
Exa exa Web search, Content fetch
Tavily tavily Web search, Content fetch

Default routing

Each capability has a default provider:

Capability Default Provider
Chat completions OpenRouter
Responses OpenRouter
Models OpenRouter
Embeddings (/v1/embeddings) OpenAI Direct
Bedrock-native embeddings (/v1/aws-bedrock/embeddings) AWS Bedrock
Gemini-native embeddings (/v1/gemini/embeddings) Google Gemini
Image generation (/v1/images/generations) OpenAI Direct (resolved per-model)
Web search Exa
Content fetch Exa

Choosing a provider

Use the X-Quantized-Provider header to override the default:

# Use Anthropic directly instead of OpenRouter
curl -X POST https://api.quantized.us/v1/chat/completions \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "X-Quantized-Provider: anthropic" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
# Use Tavily instead of Exa for web search
curl -X POST https://api.quantized.us/v1/web-search \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "X-Quantized-Provider: tavily" \
  -H "Content-Type: application/json" \
  -d '{"query": "latest AI news"}'
Model naming

All models use the author/model format (e.g., openai/gpt-4.1-mini, anthropic/claude-sonnet-4). Use the Models endpoint to list available model IDs.

Capability matrix

Endpoint OpenAI OpenRouter Anthropic Bedrock Gemini Exa Tavily
POST /v1/chat/completions Yes (default) Yes Yes
POST /v1/responses Yes (default) Yes
POST /v1/embeddings Yes (default) Yes
POST /v1/aws-bedrock/embeddings Yes (default, only)
POST /v1/gemini/embeddings Yes (default, only)
POST /v1/images/generations Yes (default) Yes — Titan / Nova / Stability (inactive, pending ops) Yes — Imagen 4 / Flash Image (inactive, pending paid tier)
GET /v1/models Yes (default) Yes
POST /v1/web-search Yes (default) Yes
POST /v1/fetch Yes (default) Yes

Chat-completions modalities

Not every provider accepts every content part on POST /v1/chat/completions. Requests are additionally gated by the target model’s declared modalities — see the Models endpoint for per-model input_modality flags.

Content part OpenRouter Anthropic Bedrock
text Yes Yes Yes
image_url Yes Yes
input_audio Yes
video_url Yes
file (PDF) Yes (universal — works on all models via OpenRouter’s PDF parser)

Sending an unsupported modality returns 400 with a descriptive error message (e.g. "Model 'openai/gpt-4.1-nano' does not support audio input") before the request reaches the provider.

Anthropic-specific behavior

When routing through X-Quantized-Provider: anthropic, the router adapts OpenAI-shaped requests to Anthropic’s native /v1/messages format:

response_format — JSON output normalization

Anthropic’s API does not natively support the response_format parameter. The router emulates it by injecting a system-prompt instruction telling the model to return raw JSON. Some Claude models (notably Claude Haiku 4.5) still wrap their output in ```json ... ``` markdown fences despite this instruction.

To uphold the response_format contract — “callers asking for JSON get parseable JSON” — the router strips a single wrapping markdown fence from the response content when:

  • the request specified response_format: {"type": "json_object" | "json_schema"}, and
  • the response content is wrapped entirely in ```json ... ``` or ``` ... ``` (a fence embedded inside prose is not stripped).

This stripping is only applied to the Anthropic provider path — OpenRouter responses are forwarded as-is because OpenRouter handles response_format server-side.

Out-of-scope content parts

Anthropic’s chat endpoint currently receives only text and image_url content parts from the router. Requests containing input_audio, video_url, or file parts are accepted by the router’s serializer but would fail upstream if routed to Anthropic. Use OpenRouter (the default for chat completions) for these modalities.

OpenAI Direct (Embeddings)

OpenAI Direct is the default provider for POST /v1/embeddings. It calls OpenAI’s /v1/embeddings endpoint with Quantized’s pooled OPENAI_API_KEY. Clients don’t need their own OpenAI account — billing is unified through Quantized’s per-key credit balance.

Models in scope

Model id Native dimension Public list rate
text-embedding-3-small 1536 $0.02 / 1M tokens
text-embedding-3-large 3072 $0.13 / 1M tokens
text-embedding-ada-002 1536 $0.10 / 1M tokens

Unknown OpenAI model ids fall back to a conservative default rate so the router never bills at $0 on a misconfigured request.

OpenRouter passthrough

X-Quantized-Provider: openrouter routes the same request through OpenRouter, which exposes OpenAI’s embedding models with an openai/ prefix. Because OpenRouter’s embedding response does not include a per-call cost, the router falls back to the OpenAI rate table after stripping the prefix.

Bedrock-specific behavior

When routing through X-Quantized-Provider: bedrock, the router calls AWS Bedrock’s Converse API on Quantized’s AWS account. Clients don’t need their own AWS credentials — billing is unified through Quantized’s per-key credit balance.

Model resolution

Only models with a bedrock row in the model catalog are routable through this header. The catalog row’s model_name field carries the full Bedrock model id (e.g., amazon.nova-micro-v1:0); clients always reference the canonical Quantized id (e.g., amazon/nova-micro).

If a model has no Bedrock row, requests with X-Quantized-Provider: bedrock fail at resolution with 400 Provider 'bedrock' not available for model '<id>' before reaching AWS. Use GET /v1/models and look for providers[].provider == "bedrock" to list eligible models.

Model access gates

Some Bedrock model families require AWS-side approval before they can be invoked, even when the catalog has a bedrock row for them:

Family Catalog model_name prefix AWS-side approval
Amazon Nova amazon.nova-* None — invokable immediately
Meta Llama, Mistral, Cohere meta.*, mistral.*, cohere.* One-line click-through, instant
Anthropic Claude anthropic.claude-* Use-case form (5 fields, manual approval)

If the AWS account behind the router lacks access for a model, you’ll see a 404 with a message like "Model use case details have not been submitted for this account..." — that’s AWS, not the router. The fix is operator-side: enable the model in the AWS Console under Bedrock → Model access for the region. Until that’s done, route the same call to OpenRouter (the default) or pick a Nova model — Amazon’s own family has no gating.

Request translation

OpenAI field Bedrock Converse field
messages (with role: "system") Split — system text becomes top-level system: [{"text": "..."}]; user/assistant stay in messages
max_tokens / max_completion_tokens inferenceConfig.maxTokens
temperature inferenceConfig.temperature
top_p inferenceConfig.topP
stop (string or array) inferenceConfig.stopSequences (always a list — must be non-whitespace, see below)
tools + tool_choice toolConfig.tools (toolSpec) + toolConfig.toolChoice
Assistant messages with tool_calls[] Assistant content with toolUse blocks
role: "tool" (with tool_call_id) User content with a toolResult block

Stop-sequence quirk

Bedrock rejects whitespace-only stop sequences with 400 The stop sequence value at inferenceConfig.stopSequences.0 is blank. Other providers (OpenRouter, Anthropic native) accept them. If you need a request body that works across all providers, use printable stop sequences such as "###", "END", or "---" instead of "\n\n".

Out-of-scope today

The following are accepted by the router but are not forwarded to Bedrock — they will produce unexpected behavior or no-ops on this provider path. Use OpenRouter (the default for chat completions) when you need them:

  • Streaming (stream: true) — Bedrock provider does not implement streaming; requests will hang or error
  • Vision / multimodal content parts (image_url, input_audio, video_url, file) — even when the underlying model supports them
  • response_format — JSON-mode emulation is not implemented for Bedrock; the parameter is silently dropped
  • Reasoning (reasoning.effort) — Claude extended thinking via Bedrock is not yet wired through
  • frequency_penalty / presence_penalty / repetition_penalty / seed / logprobs / top_logprobs / logit_bias — Bedrock’s Converse API doesn’t accept them; silently dropped

Bedrock-native Embeddings

POST /v1/aws-bedrock/embeddings is a native-shape passthrough — distinct from the OpenAI-compatible /v1/embeddings. It uses bedrock-runtime.invoke_model (NOT Converse — embedding models don’t speak Converse) and preserves Bedrock’s request/response shape byte-for-byte. See the full reference at AWS Bedrock Embeddings.

Models in scope

Model id Vendor Native dimension Public list rate
amazon.titan-embed-text-v2:0 Amazon Titan 256, 512, 1024 $0.02 / 1M tokens
cohere.embed-english-v3 Cohere 1024 $0.10 / 1M tokens
cohere.embed-multilingual-v3 Cohere 1024 $0.10 / 1M tokens

The endpoint accepts two distinct request bodies discriminated by the model prefix:

  • amazon.titan-*{ model, inputText, dimensions?, normalize?, embeddingTypes? }
  • cohere.*{ model, texts, input_type, embedding_types?, truncate? }

Mismatching the body shape and the model prefix (e.g. Cohere fields on a Titan model id) is rejected with 422 before reaching upstream.

Token estimation for Cohere

Cohere’s response does not include a token count. The router estimates input tokens at ~4 chars per token (floored at 1) — conservative and rarely under-bills natural-language input. Titan returns inputTextTokenCount directly and is billed against the upstream count.

Google Gemini (Embeddings)

POST /v1/gemini/embeddings is a native-shape passthrough to Google’s generativelanguage.googleapis.com/v1beta endpoints. Clients don’t need their own Gemini API key — billing is unified through Quantized’s per-key credit balance. See the full reference at Gemini Embeddings.

Single vs batch routing

The router picks the upstream endpoint based on the cardinality of contents:

  • 1 content:embedContent ($0.15 per 1M tokens)
  • N > 1 contents:batchEmbedContents ($0.075 per 1M tokens — half-priced)

The endpoint field in the response confirms which upstream URL was used.

The silent-concatenation trap

Sending multi-part content.parts to :embedContent (Gemini’s single-content endpoint) makes Gemini silently concatenate the parts into one string and return ONE vector for the concatenation — no error, 200 OK, wrong shape. The router always dispatches multi-content requests to :batchEmbedContents to avoid this. Treat any unexpected endpoint value as a router bug.

Models in scope

Model id Native dimension Truncatable to
gemini-embedding-001 3072 768

Token estimation

Gemini’s embedding endpoints do not return token counts. Same heuristic as Cohere (~4 chars/token, floored at 1).

Image Generation

POST /v1/images/generations is a unified endpoint — there are no native passthroughs (no /v1/aws-bedrock/images/generations, no /v1/gemini/images/generations). All providers adapt to the same OpenAI-shape request/response.

Provider matrix

Provider Models Native body shape Status
OpenAI Direct dall-e-2, dall-e-3, gpt-image-1 OpenAI /v1/images/generations Active
AWS Bedrock amazon.titan-image-generator-v2:0, amazon.nova-canvas-v1:0 { taskType, textToImageParams, imageGenerationConfig } Catalog seeded, inactive (re-enable in AWS Console → Bedrock → Model access)
AWS Bedrock — Stability stability.stable-image-{core,ultra}-v1:0, stability.sd3-5-large-v1:0 { prompt, aspect_ratio, output_format, seed?, negative_prompt? } Catalog seeded, inactive (region-restricted to us-west-2)
Google Gemini — Imagen imagen-4.0-{fast-generate,generate,ultra-generate}-001 :predict with instances + parameters Catalog seeded, inactive (paid-tier Gemini API project required)
Google Gemini — Flash Image gemini-2.5-flash-image :generateContent with responseModalities: [TEXT, IMAGE] Catalog seeded, inactive

Output transport

Forced to b64_json for every provider. Read images from data[].b64_json. There is no data[].url field — DALL-E URLs expire in ~60 minutes and would need a CDN rehost subsystem.

Provider-specific field handling

Field DALL-E 2 DALL-E 3 gpt-image-1 Bedrock Titan/Nova Bedrock Stability Imagen Gemini Flash Image
prompt Yes Yes (auto-rewritten) Yes Yes Yes Yes Yes (chat-style)
n 1–10 1 only 1 only 1–5 1 only 1–4 1 only
size 256x256, 512x512, 1024x1024 1024x1024, 1024x1792, 1792x1024 up to 3840px WxH (varies per model) mapped to nearest aspect_ratio mapped to aspectRatio + imageSize (1K / 2K) chat-style (no size knob)
quality standard, hd low, medium, high, auto standard, premium
style (ignored) vivid, natural (stripped) (stripped) (stripped) (stripped) (stripped)
background (stripped) (stripped) transparent, opaque, auto (stripped) (stripped) (stripped) (stripped)
output_format (stripped) (stripped) png, jpeg, webp (stripped) png, jpeg, webp (stripped) (stripped)
seed (stripped) (stripped) (stripped) Yes Yes (stripped) (stripped)
negative_prompt (stripped) (stripped) (stripped) negativeText negative_prompt (stripped) (stripped)

Pricing

Model Pricing model Source
dall-e-2 $0.016 / $0.018 / $0.020 per image (by size) OpenAI public list
dall-e-3 $0.040 to $0.120 per image (by size × quality) OpenAI public list
gpt-image-1 Token-priced — $5/M input + $40/M output tokens OpenAI public list
Bedrock Titan $0.008 to $0.014 per image (by size × quality) AWS Bedrock public list
Bedrock Nova Canvas $0.040 to $0.080 per image (by size × quality) AWS Bedrock public list
Bedrock Stability $0.030 / $0.065 / $0.080 per image (flat per model) AWS Bedrock public list
Imagen 4 Fast / Standard / Ultra $0.020 / $0.040 / $0.060 per image (flat per tier) Google AI Studio public list
Gemini 2.5 Flash Image ~$0.039 per 1024² image (token-priced) Google AI Studio public list

The router computes the per-call cost in the provider code (see Errors for how a $0 cost is recorded when upstream returns an error).

Watermarking

The unified response includes a watermark enum on each data[] entry:

  • c2pa — gpt-image-1 (always)
  • provenance — Amazon Titan, Nova Canvas (always)
  • synthid — Google Imagen, Gemini Flash Image (always)
  • none — DALL-E 2, DALL-E 3, Bedrock Stability

Consumers targeting education customers should consider rendering a disclosure when watermark != "none".

Content moderation

Bedrock Titan/Nova return 200 OK with an empty images array and an inline error string when content moderation triggers — they do not return a 4xx. The router surfaces this as a successful response with data: [{flagged: true, ...}] and usage.images: 0. Billing is $0 for moderation-blocked generations.

DALL-E 2/3, gpt-image-1, Stability, and Imagen all return standard error responses on moderation blocks (400 with the upstream message), which the router maps to its standard error hierarchy.

Provider errors

If the upstream provider fails (timeout, rate limit, authentication error), Quantized returns a 503 with a generic message:

{
  "error": {
    "message": "Service temporarily unavailable"
  }
}

Internal provider errors are masked to avoid leaking infrastructure details. See Errors for the full error reference.