Responses

POST /v1/responses

Generate a model response using the Responses API format. Supports instructions, multi-turn via previous_response_id, tools, and streaming.

Headers

Header Required Description
Authorization Yes Bearer <api-key-or-jwt>
Content-Type Yes application/json
X-Quantized-Provider No Force a specific provider — openrouter (default) or bedrock

Request body

Required fields

Field Type Description
model string Model identifier (e.g., openai/gpt-4.1-mini)
input string or array The input text or a list of input items

Generation parameters

Field Type Default Description
instructions string null System-level instructions for the model
max_output_tokens integer null Maximum tokens in the response (minimum: 1)
temperature float null Sampling temperature (0–2)
top_p float null Nucleus sampling threshold (0–1)
frequency_penalty float null Frequency penalty (−2 to 2)
presence_penalty float null Presence penalty (−2 to 2)

Tool calling

Field Type Default Description
tools array null Tool definitions. Supports built-in tools (web_search_preview) and custom functions
tool_choice string or object null "auto", "none", "required", or {"type": "function", "function": {"name": "..."}}
parallel_tool_calls boolean null Allow parallel tool execution

Reasoning & state

Field Type Default Description
reasoning object null Reasoning config for thinking models. effort: "none", "low", "medium", or "high". Optional exclude (boolean) controls whether reasoning content is included in the response. Example: {"effort": "low", "exclude": false}
previous_response_id string null ID of a previous response for multi-turn continuation

Streaming

Field Type Default Description
stream boolean false Enable SSE streaming
Strict validation

The API uses strict parameter validation. Any field not listed above will be rejected with a 422 error. Parameters like store, text, truncation, include, service_tier, background, top_k, metadata, and user are not currently supported.

Input format

The input field accepts either a plain string or an array of input items:

// Simple string
{"input": "What is the capital of France?"}

// Array of items
{"input": [
  {"role": "user", "content": "What is the capital of France?"}
]}

Examples

cURL
Python
OpenAI SDK
curl -X POST https://api.quantized.us/v1/responses \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4.1-mini",
    "input": "What is the capital of France?",
    "instructions": "You are a geography expert. Be concise.",
    "max_output_tokens": 128
  }'
import httpx

response = httpx.post(
    "https://api.quantized.us/v1/responses",
    headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
    json={
        "model": "openai/gpt-4.1-mini",
        "input": "What is the capital of France?",
        "instructions": "You are a geography expert. Be concise.",
        "max_output_tokens": 128,
    },
)
data = response.json()
print(data["output"][0]["content"][0]["text"])
from openai import OpenAI

client = OpenAI(
    api_key="sk-quantized-YOUR-KEY",
    base_url="https://api.quantized.us/v1",
)

response = client.responses.create(
    model="openai/gpt-4.1-mini",
    input="What is the capital of France?",
    instructions="You are a geography expert. Be concise.",
    max_output_tokens=128,
)
print(response.output[0].content[0].text)

Response

{
  "id": "resp-abc123",
  "object": "response",
  "status": "completed",
  "model": "openai/gpt-4.1-mini",
  "output": [
    {
      "id": "msg-001",
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "The capital of France is Paris."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 20,
    "output_tokens": 8,
    "total_tokens": 28,
    "credits_used": 2000,
    "credits_remaining": 998000,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens_details": {
      "reasoning_tokens": 0
    }
  },
  "created_at": 1719000000
}

Response fields

Field Type Description
id string Unique response ID
object string Always "response"
status string "completed", "in_progress", "failed"
model string Model that generated the response
output array List of output items
output[].type string "message" or "function_call"
output[].role string "assistant" for message items
output[].content array Content parts
output[].content[].type string "output_text"
output[].content[].text string The generated text
usage.input_tokens integer Input tokens
usage.output_tokens integer Output tokens
usage.total_tokens integer Total tokens
usage.credits_used integer Micro-credits consumed
usage.credits_remaining integer Micro-credits remaining
usage.input_tokens_details object or null Token breakdown: cached_tokens
usage.output_tokens_details object or null Token breakdown: reasoning_tokens
created_at integer or string Creation timestamp

Streaming

Set "stream": true to receive Server-Sent Events. The Responses endpoint uses typed events (response.created, response.output_text.delta, response.completed, etc.). See the Streaming guide for the full event format and code examples.

Errors

Status Condition
400 Invalid request (missing model, bad field types)
401 Invalid or missing API key
402 Insufficient credits
404 Model not found
422 Unsupported parameter or invalid field structure
503 Provider unavailable