Quantized API

Quantized provides a single, unified API to access multiple AI providers and tools. One API key, one credit balance, one consistent interface — regardless of which provider powers the request.

What you can do

Chat Completions

Generate text with any LLM. Compatible with the OpenAI Chat Completions format.
Supports text, vision, audio, tool calling, streaming, and more.

POST /v1/chat/completions

Responses

Stateful text generation using the Responses API format.
Supports instructions, tools, reasoning, and streaming.

POST /v1/responses

Web Search

Search the web and get structured results.

POST /v1/web-search

Fetch

Extract clean text content from any URL.

POST /v1/fetch

Supported providers

Provider Capabilities
OpenRouter 300+ LLMs (GPT-4, Claude, Llama, Gemini, …), Responses API
Anthropic Claude models directly
Exa Web search, content extraction
Tavily Web search, content extraction

You don’t need to manage separate API keys or accounts for each provider. Quantized handles provider routing, authentication, and billing automatically.

Get started

Get your API key

Your institution provides you with a Quantized API key (format: sk-quantized-...) or a JWT token.


Make your first request

curl -X POST https://api.quantized.us/v1/chat/completions \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4.1-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Explore the docs

Read the Quickstart guide or jump to the API Reference.

Key features

  • OpenAI-compatible — Use the OpenAI SDK with base_url="https://api.quantized.us/v1"
  • Unified billing — One credit balance across all providers and tools
  • Provider routing — Force a specific provider with the X-Quantized-Provider header, or let Quantized pick the default
  • Streaming — Server-Sent Events for real-time token delivery
  • Transparent pricing — Every response includes credits_used and credits_remaining