API Reference
Responses

Responses

POST /v1/responses

Generate a model response using the Responses API format. Supports instructions, multi-turn via previous_response_id, tools, and streaming.

Headers

Header	Required	Description
`Authorization`	Yes	`Bearer <api-key-or-jwt>`
`Content-Type`	Yes	`application/json`
`X-Quantized-Provider`	No	Force a specific provider — `openrouter` (default) or `bedrock`

Request body

Required fields

Field	Type	Description
`model`	string	Model identifier (e.g., `openai/gpt-4.1-mini`)
`input`	string or array	The input text or a list of input items

Generation parameters

Field	Type	Default	Description
`instructions`	string	null	System-level instructions for the model
`max_output_tokens`	integer	null	Maximum tokens in the response (minimum: 1)
`temperature`	float	null	Sampling temperature (0–2)
`top_p`	float	null	Nucleus sampling threshold (0–1)
`frequency_penalty`	float	null	Frequency penalty (−2 to 2)
`presence_penalty`	float	null	Presence penalty (−2 to 2)

Tool calling

Field	Type	Default	Description
`tools`	array	null	Tool definitions. Supports built-in tools (`web_search_preview`) and custom functions
`tool_choice`	string or object	null	`"auto"`, `"none"`, `"required"`, or `{"type": "function", "function": {"name": "..."}}`
`parallel_tool_calls`	boolean	null	Allow parallel tool execution

Reasoning & state

Field	Type	Default	Description
`reasoning`	object	null	Reasoning config for thinking models. `effort`: `"none"`, `"low"`, `"medium"`, or `"high"`. Optional `exclude` (boolean) controls whether reasoning content is included in the response. Example: `{"effort": "low", "exclude": false}`
`previous_response_id`	string	null	ID of a previous response for multi-turn continuation

Streaming

Field	Type	Default	Description
`stream`	boolean	false	Enable SSE streaming

Strict validation

The API uses strict parameter validation. Any field not listed above will be rejected with a 422 error. Parameters like store, text, truncation, include, service_tier, background, top_k, metadata, and user are not currently supported.

Input format

The input field accepts either a plain string or an array of input items:

// Simple string
{"input": "What is the capital of France?"}

// Array of items
{"input": [
  {"role": "user", "content": "What is the capital of France?"}
]}

Examples

cURL

Python

OpenAI SDK

curl -X POST https://api.quantized.us/v1/responses \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4.1-mini",
    "input": "What is the capital of France?",
    "instructions": "You are a geography expert. Be concise.",
    "max_output_tokens": 128
  }'

import httpx

response = httpx.post(
    "https://api.quantized.us/v1/responses",
    headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
    json={
        "model": "openai/gpt-4.1-mini",
        "input": "What is the capital of France?",
        "instructions": "You are a geography expert. Be concise.",
        "max_output_tokens": 128,
    },
)
data = response.json()
print(data["output"][0]["content"][0]["text"])

from openai import OpenAI

client = OpenAI(
    api_key="sk-quantized-YOUR-KEY",
    base_url="https://api.quantized.us/v1",
)

response = client.responses.create(
    model="openai/gpt-4.1-mini",
    input="What is the capital of France?",
    instructions="You are a geography expert. Be concise.",
    max_output_tokens=128,
)
print(response.output[0].content[0].text)

Response

{
  "id": "resp-abc123",
  "object": "response",
  "status": "completed",
  "model": "openai/gpt-4.1-mini",
  "output": [
    {
      "id": "msg-001",
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "The capital of France is Paris."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 20,
    "output_tokens": 8,
    "total_tokens": 28,
    "credits_used": 2000,
    "credits_remaining": 998000,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens_details": {
      "reasoning_tokens": 0
    }
  },
  "created_at": 1719000000
}

Response fields

Field	Type	Description
`id`	string	Unique response ID
`object`	string	Always `"response"`
`status`	string	`"completed"`, `"in_progress"`, `"failed"`
`model`	string	Model that generated the response
`output`	array	List of output items
`output[].type`	string	`"message"` or `"function_call"`
`output[].role`	string	`"assistant"` for message items
`output[].content`	array	Content parts
`output[].content[].type`	string	`"output_text"`
`output[].content[].text`	string	The generated text
`usage.input_tokens`	integer	Input tokens
`usage.output_tokens`	integer	Output tokens
`usage.total_tokens`	integer	Total tokens
`usage.credits_used`	integer	Micro-credits consumed
`usage.credits_remaining`	integer	Micro-credits remaining
`usage.input_tokens_details`	object or null	Token breakdown: `cached_tokens`
`usage.output_tokens_details`	object or null	Token breakdown: `reasoning_tokens`
`created_at`	integer or string	Creation timestamp

Streaming

Set "stream": true to receive Server-Sent Events. The Responses endpoint uses typed events (response.created, response.output_text.delta, response.completed, etc.). See the Streaming guide for the full event format and code examples.

Errors

Status	Condition
`400`	Invalid request (missing model, bad field types)
`401`	Invalid or missing API key
`402`	Insufficient credits
`404`	Model not found
`422`	Unsupported parameter or invalid field structure
`503`	Provider unavailable

Responses

Headers

Request body

Required fields

Generation parameters

Tool calling

Reasoning & state

Streaming

Input format

Examples

Response

Response fields

Streaming

Errors

On This Page