Chat Completions
Chat Completions
POST /v1/chat/completions
Generate a model response for a conversation. Compatible with the OpenAI Chat Completions API.
Headers
| Header | Required | Description |
|---|---|---|
Authorization |
Yes | Bearer <api-key-or-jwt> |
Content-Type |
Yes | application/json |
X-Quantized-Provider |
No | Force a specific provider (openrouter, anthropic) |
Request body
Required fields
| Field | Type | Description |
|---|---|---|
model |
string | Model identifier (e.g., openai/gpt-4.1-mini) |
messages |
array | List of conversation messages |
Generation parameters
| Field | Type | Default | Description |
|---|---|---|---|
max_tokens |
integer | null | Maximum tokens in the completion |
max_completion_tokens |
integer | null | Alternative to max_tokens |
temperature |
float | null | Sampling temperature (0–2). Lower is more deterministic |
top_p |
float | null | Nucleus sampling threshold |
top_k |
float | null | Top-k sampling (provider-dependent) |
frequency_penalty |
float | null | Penalize tokens by frequency (−2.0 to 2.0) |
presence_penalty |
float | null | Penalize tokens by presence (−2.0 to 2.0) |
repetition_penalty |
float | null | Repetition penalty factor |
stop |
string or array | null | Stop sequence(s) |
seed |
integer | null | Seed for deterministic generation |
logprobs |
boolean | null | Return log probabilities |
top_logprobs |
integer | null | Number of top log probabilities to return |
logit_bias |
object | null | Token ID to bias mapping |
Output control
| Field | Type | Default | Description |
|---|---|---|---|
response_format |
object | null | Output format: {"type": "json_object"} or {"type": "json_schema", "json_schema": {...}} |
modalities |
array | null | Output modalities (e.g., ["text", "audio"]) |
audio |
object | null | Audio output configuration (voice, format) |
Tool calling
| Field | Type | Default | Description |
|---|---|---|---|
tools |
array | null | Tool/function definitions |
tool_choice |
any | null | "auto", "none", "required", or {"type": "function", "function": {"name": "..."}} |
parallel_tool_calls |
boolean | null | Allow parallel tool execution |
Advanced
| Field | Type | Default | Description |
|---|---|---|---|
reasoning |
object | null | Reasoning configuration for reasoning models |
web_search_options |
object | null | Web search plugin options (provider-dependent) |
metadata |
object | null | Key-value metadata passed to the provider |
user |
string | null | User identifier for abuse tracking |
stream |
boolean | false | Enable SSE streaming |
stream_options |
object | null | Streaming options (e.g., {"include_usage": true}) |
Messages
Each message is an object with a role and content:
{"role": "user", "content": "What is 2+2?"}
Roles
| Role | Description |
|---|---|
system |
Sets the model’s behavior and context |
developer |
Developer-level instructions (similar to system) |
user |
The user’s input |
assistant |
The model’s previous response (for multi-turn) |
tool |
Response from a tool call |
Text messages
[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
Multi-turn conversations
[
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "What about Germany?"}
]
Vision (image input)
Pass images as content parts:
[
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}
]
Tool calls
[
{"role": "user", "content": "What's the weather in Paris?"},
{
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": "call_1",
"type": "function",
"function": {"name": "get_weather", "arguments": "{\"city\": \"Paris\"}"}
}
]
},
{"role": "tool", "tool_call_id": "call_1", "content": "{\"temp\": 18, \"unit\": \"C\"}"}
]
Examples
cURL
Python
OpenAI SDK
curl -X POST https://api.quantized.us/v1/chat/completions \
-H "Authorization: Bearer sk-quantized-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4.1-mini",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"max_tokens": 128,
"temperature": 0.7
}'
import httpx
response = httpx.post(
"https://api.quantized.us/v1/chat/completions",
headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
json={
"model": "openai/gpt-4.1-mini",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
],
"max_tokens": 128,
"temperature": 0.7,
},
)
data = response.json()
print(data["choices"][0]["message"]["content"])
from openai import OpenAI
client = OpenAI(
api_key="sk-quantized-YOUR-KEY",
base_url="https://api.quantized.us/v1",
)
response = client.chat.completions.create(
model="openai/gpt-4.1-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
],
max_tokens=128,
temperature=0.7,
)
print(response.choices[0].message.content)
Response
{
"id": "gen-abc123",
"object": "chat.completion",
"model": "openai/gpt-4.1-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8,
"total_tokens": 33,
"credits_used": 2400,
"credits_remaining": 997600,
"prompt_tokens_details": {
"cached_tokens": 0,
"cache_write_tokens": 0,
"audio_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 0,
"audio_tokens": 0
}
},
"created": 1719000000,
"system_fingerprint": "fp_abc123"
}
Response fields
| Field | Type | Description |
|---|---|---|
id |
string | Unique completion ID |
object |
string | Always "chat.completion" |
model |
string | Model that generated the response |
choices |
array | List of completion choices |
choices[].index |
integer | Choice index |
choices[].message.role |
string | Always "assistant" |
choices[].message.content |
string | The generated text |
choices[].message.tool_calls |
array | Tool calls made by the model (if any) |
choices[].message.reasoning |
string | Chain-of-thought reasoning (reasoning models) |
choices[].finish_reason |
string | "stop", "length", "tool_calls" |
choices[].logprobs |
object | Log probabilities (if requested) |
usage.prompt_tokens |
integer | Input tokens |
usage.completion_tokens |
integer | Output tokens |
usage.total_tokens |
integer | Total tokens |
usage.credits_used |
integer | Micro-credits consumed |
usage.credits_remaining |
integer | Micro-credits remaining (null if unlimited) |
created |
integer | Unix timestamp |
system_fingerprint |
string | Model configuration fingerprint |
Streaming
Set "stream": true to receive Server-Sent Events. See the Streaming guide for details and code examples.
Errors
| Status | Condition |
|---|---|
400 |
Invalid request (missing model, bad field types, unknown fields) |
401 |
Invalid or missing API key |
402 |
Insufficient credits |
404 |
Model not found |
503 |
Provider unavailable |