Responses
Responses
POST /v1/responses
Generate a model response using the Responses API format. Supports instructions, multi-turn via previous_response_id, tools, and streaming.
Headers
| Header | Required | Description |
|---|---|---|
Authorization |
Yes | Bearer <api-key-or-jwt> |
Content-Type |
Yes | application/json |
X-Quantized-Provider |
No | Force a specific provider — openrouter (default) or bedrock |
Request body
Required fields
| Field | Type | Description |
|---|---|---|
model |
string | Model identifier (e.g., openai/gpt-4.1-mini) |
input |
string or array | The input text or a list of input items |
Generation parameters
| Field | Type | Default | Description |
|---|---|---|---|
instructions |
string | null | System-level instructions for the model |
max_output_tokens |
integer | null | Maximum tokens in the response (minimum: 1) |
temperature |
float | null | Sampling temperature (0–2) |
top_p |
float | null | Nucleus sampling threshold (0–1) |
frequency_penalty |
float | null | Frequency penalty (−2 to 2) |
presence_penalty |
float | null | Presence penalty (−2 to 2) |
Tool calling
| Field | Type | Default | Description |
|---|---|---|---|
tools |
array | null | Tool definitions. Supports built-in tools (web_search_preview) and custom functions |
tool_choice |
string or object | null | "auto", "none", "required", or {"type": "function", "function": {"name": "..."}} |
parallel_tool_calls |
boolean | null | Allow parallel tool execution |
Reasoning & state
| Field | Type | Default | Description |
|---|---|---|---|
reasoning |
object | null | Reasoning config for thinking models. effort: "none", "low", "medium", or "high". Optional exclude (boolean) controls whether reasoning content is included in the response. Example: {"effort": "low", "exclude": false} |
previous_response_id |
string | null | ID of a previous response for multi-turn continuation |
Streaming
| Field | Type | Default | Description |
|---|---|---|---|
stream |
boolean | false | Enable SSE streaming |
Strict validation
The API uses strict parameter validation. Any field not listed above will be rejected with a 422 error. Parameters like store, text, truncation, include, service_tier, background, top_k, metadata, and user are not currently supported.
Input format
The input field accepts either a plain string or an array of input items:
// Simple string
{"input": "What is the capital of France?"}
// Array of items
{"input": [
{"role": "user", "content": "What is the capital of France?"}
]}
Examples
cURL
Python
OpenAI SDK
curl -X POST https://api.quantized.us/v1/responses \
-H "Authorization: Bearer sk-quantized-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4.1-mini",
"input": "What is the capital of France?",
"instructions": "You are a geography expert. Be concise.",
"max_output_tokens": 128
}'
import httpx
response = httpx.post(
"https://api.quantized.us/v1/responses",
headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
json={
"model": "openai/gpt-4.1-mini",
"input": "What is the capital of France?",
"instructions": "You are a geography expert. Be concise.",
"max_output_tokens": 128,
},
)
data = response.json()
print(data["output"][0]["content"][0]["text"])
from openai import OpenAI
client = OpenAI(
api_key="sk-quantized-YOUR-KEY",
base_url="https://api.quantized.us/v1",
)
response = client.responses.create(
model="openai/gpt-4.1-mini",
input="What is the capital of France?",
instructions="You are a geography expert. Be concise.",
max_output_tokens=128,
)
print(response.output[0].content[0].text)
Response
{
"id": "resp-abc123",
"object": "response",
"status": "completed",
"model": "openai/gpt-4.1-mini",
"output": [
{
"id": "msg-001",
"type": "message",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "The capital of France is Paris."
}
]
}
],
"usage": {
"input_tokens": 20,
"output_tokens": 8,
"total_tokens": 28,
"credits_used": 2000,
"credits_remaining": 998000,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens_details": {
"reasoning_tokens": 0
}
},
"created_at": 1719000000
}
Response fields
| Field | Type | Description |
|---|---|---|
id |
string | Unique response ID |
object |
string | Always "response" |
status |
string | "completed", "in_progress", "failed" |
model |
string | Model that generated the response |
output |
array | List of output items |
output[].type |
string | "message" or "function_call" |
output[].role |
string | "assistant" for message items |
output[].content |
array | Content parts |
output[].content[].type |
string | "output_text" |
output[].content[].text |
string | The generated text |
usage.input_tokens |
integer | Input tokens |
usage.output_tokens |
integer | Output tokens |
usage.total_tokens |
integer | Total tokens |
usage.credits_used |
integer | Micro-credits consumed |
usage.credits_remaining |
integer | Micro-credits remaining |
usage.input_tokens_details |
object or null | Token breakdown: cached_tokens |
usage.output_tokens_details |
object or null | Token breakdown: reasoning_tokens |
created_at |
integer or string | Creation timestamp |
Streaming
Set "stream": true to receive Server-Sent Events. The Responses endpoint uses typed events (response.created, response.output_text.delta, response.completed, etc.). See the Streaming guide for the full event format and code examples.
Errors
| Status | Condition |
|---|---|
400 |
Invalid request (missing model, bad field types) |
401 |
Invalid or missing API key |
402 |
Insufficient credits |
404 |
Model not found |
422 |
Unsupported parameter or invalid field structure |
503 |
Provider unavailable |