Chat Completions

POST /v1/chat/completions

Generate a model response for a conversation. Compatible with the OpenAI Chat Completions API.

Headers

Header Required Description
Authorization Yes Bearer <api-key-or-jwt>
Content-Type Yes application/json
X-Quantized-Provider No Force a specific provider (openrouter, anthropic)

Request body

Required fields

Field Type Description
model string Model identifier (e.g., openai/gpt-4.1-mini)
messages array List of conversation messages

Generation parameters

Field Type Default Description
max_tokens integer null Maximum tokens in the completion
max_completion_tokens integer null Alternative to max_tokens
temperature float null Sampling temperature (0–2). Lower is more deterministic
top_p float null Nucleus sampling threshold
top_k float null Top-k sampling (provider-dependent)
frequency_penalty float null Penalize tokens by frequency (−2.0 to 2.0)
presence_penalty float null Penalize tokens by presence (−2.0 to 2.0)
repetition_penalty float null Repetition penalty factor
stop string or array null Stop sequence(s)
seed integer null Seed for deterministic generation
logprobs boolean null Return log probabilities
top_logprobs integer null Number of top log probabilities to return
logit_bias object null Token ID to bias mapping

Output control

Field Type Default Description
response_format object null Output format: {"type": "json_object"} or {"type": "json_schema", "json_schema": {...}}
modalities array null Output modalities (e.g., ["text", "audio"])
audio object null Audio output configuration (voice, format)

Tool calling

Field Type Default Description
tools array null Tool/function definitions
tool_choice any null "auto", "none", "required", or {"type": "function", "function": {"name": "..."}}
parallel_tool_calls boolean null Allow parallel tool execution

Advanced

Field Type Default Description
reasoning object null Reasoning configuration for reasoning models
web_search_options object null Web search plugin options (provider-dependent)
metadata object null Key-value metadata passed to the provider
user string null User identifier for abuse tracking
stream boolean false Enable SSE streaming
stream_options object null Streaming options (e.g., {"include_usage": true})

Messages

Each message is an object with a role and content:

{"role": "user", "content": "What is 2+2?"}

Roles

Role Description
system Sets the model’s behavior and context
developer Developer-level instructions (similar to system)
user The user’s input
assistant The model’s previous response (for multi-turn)
tool Response from a tool call

Text messages

[
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": "Hello!"}
]

Multi-turn conversations

[
  {"role": "user", "content": "What is the capital of France?"},
  {"role": "assistant", "content": "The capital of France is Paris."},
  {"role": "user", "content": "What about Germany?"}
]

Vision (image input)

Pass images as content parts:

[
  {
    "role": "user",
    "content": [
      {"type": "text", "text": "What is in this image?"},
      {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
    ]
  }
]

Tool calls

[
  {"role": "user", "content": "What's the weather in Paris?"},
  {
    "role": "assistant",
    "content": "",
    "tool_calls": [
      {
        "id": "call_1",
        "type": "function",
        "function": {"name": "get_weather", "arguments": "{\"city\": \"Paris\"}"}
      }
    ]
  },
  {"role": "tool", "tool_call_id": "call_1", "content": "{\"temp\": 18, \"unit\": \"C\"}"}
]

Examples

cURL
Python
OpenAI SDK
curl -X POST https://api.quantized.us/v1/chat/completions \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4.1-mini",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "max_tokens": 128,
    "temperature": 0.7
  }'
import httpx

response = httpx.post(
    "https://api.quantized.us/v1/chat/completions",
    headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
    json={
        "model": "openai/gpt-4.1-mini",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is the capital of France?"},
        ],
        "max_tokens": 128,
        "temperature": 0.7,
    },
)
data = response.json()
print(data["choices"][0]["message"]["content"])
from openai import OpenAI

client = OpenAI(
    api_key="sk-quantized-YOUR-KEY",
    base_url="https://api.quantized.us/v1",
)

response = client.chat.completions.create(
    model="openai/gpt-4.1-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=128,
    temperature=0.7,
)
print(response.choices[0].message.content)

Response

{
  "id": "gen-abc123",
  "object": "chat.completion",
  "model": "openai/gpt-4.1-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33,
    "credits_used": 2400,
    "credits_remaining": 997600,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "cache_write_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0
    }
  },
  "created": 1719000000,
  "system_fingerprint": "fp_abc123"
}

Response fields

Field Type Description
id string Unique completion ID
object string Always "chat.completion"
model string Model that generated the response
choices array List of completion choices
choices[].index integer Choice index
choices[].message.role string Always "assistant"
choices[].message.content string The generated text
choices[].message.tool_calls array Tool calls made by the model (if any)
choices[].message.reasoning string Chain-of-thought reasoning (reasoning models)
choices[].finish_reason string "stop", "length", "tool_calls"
choices[].logprobs object Log probabilities (if requested)
usage.prompt_tokens integer Input tokens
usage.completion_tokens integer Output tokens
usage.total_tokens integer Total tokens
usage.credits_used integer Micro-credits consumed
usage.credits_remaining integer Micro-credits remaining (null if unlimited)
created integer Unix timestamp
system_fingerprint string Model configuration fingerprint

Streaming

Set "stream": true to receive Server-Sent Events. See the Streaming guide for details and code examples.

Errors

Status Condition
400 Invalid request (missing model, bad field types, unknown fields)
401 Invalid or missing API key
402 Insufficient credits
404 Model not found
503 Provider unavailable