API Reference
Chat Completions

Chat Completions

POST /v1/chat/completions

Generate a model response for a conversation. Compatible with the OpenAI Chat Completions API.

Headers

Header	Required	Description
`Authorization`	Yes	`Bearer <api-key-or-jwt>`
`Content-Type`	Yes	`application/json`
`X-Quantized-Provider`	No	Force a specific provider (`openrouter`, `anthropic`, `bedrock`)

Request body

Required fields

Field	Type	Description
`model`	string	Model identifier (e.g., `openai/gpt-4.1-mini`)
`messages`	array	List of conversation messages

Generation parameters

Field	Type	Default	Description
`max_tokens`	integer	null	Maximum tokens in the completion (minimum: 1)
`max_completion_tokens`	integer	null	Alternative to `max_tokens` (minimum: 1)
`temperature`	float	null	Sampling temperature (0–2). Lower is more deterministic
`top_p`	float	null	Nucleus sampling threshold (0–1)
`frequency_penalty`	float	null	Penalize tokens by frequency (−2 to 2)
`presence_penalty`	float	null	Penalize tokens by presence (−2 to 2)
`repetition_penalty`	float	null	Repetition penalty factor (0–2). OpenRouter-specific
`stop`	string or array	null	Stop sequence(s)
`seed`	integer	null	Seed for deterministic generation (best effort)

Output control

Field	Type	Default	Description
`response_format`	object	null	Output format: `{"type": "json_object"}` or `{"type": "json_schema", "json_schema": {...}}`

JSON output is guaranteed parseable

When response_format is json_object or json_schema, the router guarantees the choices[].message.content value is directly JSON.parse-able. Some models (notably Claude Haiku 4.5 when routed through Anthropic) wrap JSON output in markdown fences — the router strips those on your behalf so you don’t have to. See the Anthropic provider notes for details.

Tool calling

Field	Type	Default	Description
`tools`	array	null	Tool/function definitions. See Tool format below
`tool_choice`	string or object	null	`"auto"`, `"none"`, `"required"`, or `{"type": "function", "function": {"name": "..."}}`
`parallel_tool_calls`	boolean	null	Allow parallel tool execution

Reasoning

Field	Type	Default	Description
`reasoning`	object	null	Reasoning config for thinking models. `effort`: `"none"`, `"low"`, `"medium"`, or `"high"`. Optional `exclude` (boolean) controls whether reasoning content is included in the response. Example: `{"effort": "low", "exclude": false}`

Streaming

Field	Type	Default	Description
`stream`	boolean	false	Enable SSE streaming
`stream_options`	object	null	Streaming configuration, e.g. `{"include_usage": true}`. Usage is always included in the final SSE chunk regardless of this setting

Advanced

Field	Type	Default	Description
`logprobs`	boolean	null	Enable token log probabilities in the response
`top_logprobs`	integer	null	Number of most likely tokens to return per position (0-20). Requires `logprobs: true`
`logit_bias`	object	null	Map of token IDs to bias values (-100 to 100). Adjusts likelihood of specific tokens appearing in the output
`user`	string	null	End-user identifier for tracking and abuse detection

Strict validation

The API uses strict parameter validation. Any field not listed above will be rejected with a 422 error. Parameters like top_k, modalities, audio, web_search_options, and metadata are not currently supported.

Messages

Each message must be an object with role (required) and content. Plain strings are not accepted.

Message fields

Field	Type	Required	Description
`role`	string	Yes	One of: `system`, `user`, `assistant`, `tool`
`content`	string, array, or null	No	Text content, content parts (for vision), or null (for tool call messages)
`name`	string	No	Participant name (for multi-user conversations)
`tool_calls`	array	No	Tool calls made by the assistant (in `assistant` messages)
`tool_call_id`	string	No	Links a tool response to its call (required for `tool` messages)
`refusal`	string or null	No	Model refusal text. Only valid in `assistant` messages
`reasoning`	string or null	No	Reasoning text from thinking models. Only valid in `assistant` messages
`reasoning_details`	array or null	No	Detailed reasoning steps. Only valid in `assistant` messages

{"role": "user", "content": "What is 2+2?"}

Roles

Role	Description
`system`	Sets the model’s behavior and context
`user`	The user’s input
`assistant`	The model’s previous response (for multi-turn)
`tool`	Response from a tool call (must include `tool_call_id`)

Text messages

[
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": "Hello!"}
]

Multi-turn conversations

[
  {"role": "user", "content": "What is the capital of France?"},
  {"role": "assistant", "content": "The capital of France is Paris."},
  {"role": "user", "content": "What about Germany?"}
]

Multimodal content parts

To send anything other than plain text, set content to an array of content parts. Each part declares a type and a type-specific payload. The following part types are accepted:

Content type	Modality	Shape
`text`	text	`{"type": "text", "text": "..."}`
`image_url`	image	`{"type": "image_url", "image_url": {"url": "..."}}`
`input_audio`	audio	`{"type": "input_audio", "input_audio": {"data": "<base64>", "format": "mp3"}}`
`video_url`	video	`{"type": "video_url", "video_url": {"url": "..."}}`
`file`	document	`{"type": "file", "file": {"filename": "...", "file_data": "..."}}`

Any other type value is rejected with a 422 error.

Image

Image URLs can be an HTTPS URL or a base64 data URI.

[
  {
    "role": "user",
    "content": [
      {"type": "text", "text": "What is in this image?"},
      {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
    ]
  }
]

https://example.com/image.png
data:image/jpeg;base64,/9j/4AAQ...

Audio

input_audio.data must be a raw base64 string (no data-URI prefix). input_audio.format is one of wav, mp3, aiff, aac, ogg, flac, m4a, pcm16, pcm24.

[
  {
    "role": "user",
    "content": [
      {"type": "text", "text": "Transcribe and summarize this audio."},
      {"type": "input_audio", "input_audio": {"data": "<base64-mp3>", "format": "mp3"}}
    ]
  }
]

Video

video_url.url can be an HTTPS URL or a base64 data URI with a video MIME type (video/mp4, video/mpeg, video/mov, video/webm).

[
  {
    "role": "user",
    "content": [
      {"type": "text", "text": "Describe what is happening in this video."},
      {"type": "video_url", "video_url": {"url": "data:video/mp4;base64,<base64>"}}
    ]
  }
]

Document (PDF)

file.file_data can be an HTTPS URL or a base64 data URI with application/pdf. Either file_data or file_id is required.

[
  {
    "role": "user",
    "content": [
      {"type": "text", "text": "Summarize this document."},
      {
        "type": "file",
        "file": {
          "filename": "report.pdf",
          "file_data": "data:application/pdf;base64,<base64>"
        }
      }
    ]
  }
]

Model requirements

The model you target must declare support for each non-text modality you send. If a request contains audio parts but the chosen model’s input_modality.audio is false, the request is rejected with a 400:

{"error": {"message": "Model 'openai/gpt-4.1-nano' does not support audio input"}}

See the Providers capability matrix for per-provider support, and the Models endpoint for per-model modality flags. PDFs (file) are routed through OpenRouter’s universal PDF parser and do not require a dedicated modality flag on the model.

Tool calls (multi-turn)

When the model calls a tool, continue the conversation by including the assistant’s tool call and your tool’s response:

[
  {"role": "user", "content": "What's the weather in Paris?"},
  {
    "role": "assistant",
    "content": null,
    "tool_calls": [
      {
        "id": "call_1",
        "type": "function",
        "function": {"name": "get_weather", "arguments": "{\"city\": \"Paris\"}"}
      }
    ]
  },
  {"role": "tool", "tool_call_id": "call_1", "content": "{\"temp\": 18, \"unit\": \"C\"}"}
]

Tool format

Define tools using the standard function calling format. This format is the same regardless of which provider handles the request — the provider layer converts it automatically.

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {"type": "string", "description": "The city name"}
          },
          "required": ["city"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Examples

cURL

Python

OpenAI SDK

curl -X POST https://api.quantized.us/v1/chat/completions \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4.1-mini",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "max_tokens": 128,
    "temperature": 0.7
  }'

import httpx

response = httpx.post(
    "https://api.quantized.us/v1/chat/completions",
    headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
    json={
        "model": "openai/gpt-4.1-mini",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is the capital of France?"},
        ],
        "max_tokens": 128,
        "temperature": 0.7,
    },
)
data = response.json()
print(data["choices"][0]["message"]["content"])

from openai import OpenAI

client = OpenAI(
    api_key="sk-quantized-YOUR-KEY",
    base_url="https://api.quantized.us/v1",
)

response = client.chat.completions.create(
    model="openai/gpt-4.1-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=128,
    temperature=0.7,
)
print(response.choices[0].message.content)

Response

{
  "id": "gen-abc123",
  "object": "chat.completion",
  "model": "openai/gpt-4.1-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris.",
        "refusal": null
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33,
    "credits_used": 2400,
    "credits_remaining": 997600,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "cache_write_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0
    }
  },
  "created": 1719000000
}

Response fields

Field	Type	Description
`id`	string	Unique completion ID
`object`	string	Always `"chat.completion"`
`model`	string	Model that generated the response
`created`	integer	Unix timestamp
`choices`	array	List of completion choices
`choices[].index`	integer	Choice index
`choices[].message.role`	string	Always `"assistant"`
`choices[].message.content`	string or null	The generated text (null when tool_calls present)
`choices[].message.refusal`	string or null	Model’s refusal message if it declined to answer
`choices[].message.tool_calls`	array or null	Tool calls made by the model (present when model calls tools)
`choices[].message.reasoning`	string or null	Reasoning text from thinking models (present when reasoning is enabled)
`choices[].message.reasoning_details`	array or null	Detailed reasoning steps (present when reasoning is enabled)
`choices[].finish_reason`	string	`"stop"`, `"length"`, or `"tool_calls"`
`choices[].logprobs`	object or null	Token log probabilities (present when `top_logprobs` is set in the request)
`usage.prompt_tokens`	integer	Input tokens
`usage.completion_tokens`	integer	Output tokens
`usage.total_tokens`	integer	Total tokens
`usage.credits_used`	integer	Micro-credits consumed
`usage.credits_remaining`	integer or null	Micro-credits remaining (null if unlimited)
`usage.prompt_tokens_details`	object or null	Token breakdown: `cached_tokens`, `cache_write_tokens`, `audio_tokens`
`usage.completion_tokens_details`	object or null	Token breakdown: `reasoning_tokens`, `audio_tokens`

Streaming

Set "stream": true to receive Server-Sent Events. See the Streaming guide for details and code examples.

Errors

Status	Condition
`400`	Invalid request (missing model, bad field types)
`401`	Invalid or missing API key
`402`	Insufficient credits
`404`	Model not found
`422`	Unsupported parameter or invalid field structure
`503`	Provider unavailable

Chat Completions

Headers

Request body

Required fields

Generation parameters

Output control

Tool calling

Reasoning

Streaming

Advanced

Messages

Message fields

Roles

Text messages

Multi-turn conversations

Multimodal content parts

Image

Audio

Video

Document (PDF)

Model requirements

Tool calls (multi-turn)

Tool format

Examples

Response

Response fields

Streaming

Errors

On This Page