Quantized API
Quantized provides a single, unified API to access multiple AI providers and tools. One API key, one credit balance, one consistent interface — regardless of which provider powers the request.
What you can do
Generate text with any LLM. Compatible with the OpenAI Chat Completions format.
Supports text, vision, audio, tool calling, streaming, and more.
POST /v1/chat/completions
Stateful text generation using the Responses API format.
Supports instructions, tools, reasoning, and streaming.
POST /v1/responses
Search the web and get structured results.
POST /v1/web-search
Extract clean text content from any URL.
POST /v1/fetch
Supported providers
| Provider | Capabilities |
|---|---|
| OpenRouter | 300+ LLMs (GPT-4, Claude, Llama, Gemini, …), Responses API |
| Anthropic | Claude models directly |
| Exa | Web search, content extraction |
| Tavily | Web search, content extraction |
You don’t need to manage separate API keys or accounts for each provider. Quantized handles provider routing, authentication, and billing automatically.
Get started
Get your API key
Your institution provides you with a Quantized API key (format: sk-quantized-...) or a JWT token.
Make your first request
curl -X POST https://api.quantized.us/v1/chat/completions \
-H "Authorization: Bearer sk-quantized-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4.1-mini",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Explore the docs
Read the Quickstart guide or jump to the API Reference.
Key features
- OpenAI-compatible — Use the OpenAI SDK with
base_url="https://api.quantized.us/v1" - Unified billing — One credit balance across all providers and tools
- Provider routing — Force a specific provider with the
X-Quantized-Providerheader, or let Quantized pick the default - Streaming — Server-Sent Events for real-time token delivery
- Transparent pricing — Every response includes
credits_usedandcredits_remaining