Responses
Responses
POST /v1/responses
Generate a model response using the Responses API format. Supports instructions, multi-turn via previous_response_id, tools, and streaming.
Headers
| Header | Required | Description |
|---|---|---|
Authorization |
Yes | Bearer <api-key-or-jwt> |
Content-Type |
Yes | application/json |
X-Quantized-Provider |
No | Force a specific provider (default: openrouter) |
Request body
Required fields
| Field | Type | Description |
|---|---|---|
model |
string | Model identifier (e.g., openai/gpt-4.1-mini) |
input |
string or array | The input text or a list of input items |
Optional fields
| Field | Type | Default | Description |
|---|---|---|---|
instructions |
string | null | System-level instructions for the model |
max_output_tokens |
integer | null | Maximum tokens in the response |
temperature |
float | null | Sampling temperature (0–2) |
top_p |
float | null | Nucleus sampling threshold |
top_k |
float | null | Top-k sampling |
frequency_penalty |
float | null | Frequency penalty (−2.0 to 2.0) |
presence_penalty |
float | null | Presence penalty (−2.0 to 2.0) |
previous_response_id |
string | null | ID of a previous response for multi-turn |
tools |
array | null | Tool/function definitions |
tool_choice |
any | null | Tool selection strategy |
parallel_tool_calls |
boolean | null | Allow parallel tool execution |
store |
boolean | null | Store the response for later retrieval |
stream |
boolean | false | Enable SSE streaming |
metadata |
object | null | Key-value metadata |
user |
string | null | User identifier |
reasoning |
object | null | Reasoning configuration |
text |
object | null | Text output configuration |
truncation |
string | null | Truncation strategy |
include |
array | null | Additional data to include in the response |
service_tier |
string | null | Service tier preference |
background |
boolean | null | Run in background |
Input format
The input field accepts either a plain string or an array of input items:
// Simple string
{"input": "What is the capital of France?"}
// Array of items
{"input": [
{"role": "user", "content": "What is the capital of France?"}
]}
Examples
cURL
Python
OpenAI SDK
curl -X POST https://api.quantized.us/v1/responses \
-H "Authorization: Bearer sk-quantized-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4.1-mini",
"input": "What is the capital of France?",
"instructions": "You are a geography expert. Be concise.",
"max_output_tokens": 128
}'
import httpx
response = httpx.post(
"https://api.quantized.us/v1/responses",
headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
json={
"model": "openai/gpt-4.1-mini",
"input": "What is the capital of France?",
"instructions": "You are a geography expert. Be concise.",
"max_output_tokens": 128,
},
)
data = response.json()
print(data["output"][0]["content"][0]["text"])
from openai import OpenAI
client = OpenAI(
api_key="sk-quantized-YOUR-KEY",
base_url="https://api.quantized.us/v1",
)
response = client.responses.create(
model="openai/gpt-4.1-mini",
input="What is the capital of France?",
instructions="You are a geography expert. Be concise.",
max_output_tokens=128,
)
print(response.output[0].content[0].text)
Response
{
"id": "resp-abc123",
"object": "response",
"status": "completed",
"model": "openai/gpt-4.1-mini",
"output": [
{
"id": "msg-001",
"type": "message",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "The capital of France is Paris."
}
]
}
],
"usage": {
"input_tokens": 20,
"output_tokens": 8,
"total_tokens": 28,
"credits_used": 2000,
"credits_remaining": 998000,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens_details": {
"reasoning_tokens": 0
}
},
"created_at": 1719000000
}
Response fields
| Field | Type | Description |
|---|---|---|
id |
string | Unique response ID |
object |
string | Always "response" |
status |
string | "completed", "in_progress", "failed" |
model |
string | Model that generated the response |
output |
array | List of output items |
output[].type |
string | "message" or "function_call" |
output[].role |
string | "assistant" for message items |
output[].content |
array | Content parts |
output[].content[].type |
string | "output_text" |
output[].content[].text |
string | The generated text |
usage.input_tokens |
integer | Input tokens |
usage.output_tokens |
integer | Output tokens |
usage.total_tokens |
integer | Total tokens |
usage.credits_used |
integer | Micro-credits consumed |
usage.credits_remaining |
integer | Micro-credits remaining |
created_at |
integer or string | Creation timestamp |
Streaming
Set "stream": true to receive Server-Sent Events. The Responses endpoint uses typed events (response.created, response.output_text.delta, response.completed, etc.). See the Streaming guide for the full event format and code examples.
Errors
| Status | Condition |
|---|---|
400 |
Invalid request |
401 |
Invalid or missing API key |
402 |
Insufficient credits |
404 |
Model not found |
503 |
Provider unavailable |