Streaming

Both the Chat Completions and Responses endpoints support streaming via Server-Sent Events (SSE). Streaming delivers tokens incrementally as the model generates them, instead of waiting for the full response.

Enabling streaming

Set stream: true in the request body:

curl -X POST https://api.quantized.us/v1/chat/completions \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4.1-mini",
    "messages": [{"role": "user", "content": "Write a haiku about code."}],
    "stream": true
  }'

The response uses Content-Type: text/event-stream with the following headers:

Content-Type: text/event-stream; charset=utf-8
Cache-Control: no-cache
Connection: keep-alive
X-Accel-Buffering: no

Chat Completions stream format

Each event is a data: line containing a JSON object. The stream ends with data: [DONE].

data: {"id":"gen-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"gen-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Lines"},"finish_reason":null}]}

data: {"id":"gen-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" of"},"finish_reason":null}]}

data: {"id":"gen-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" code"},"finish_reason":null}]}

data: {"id":"gen-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: {"id":"gen-abc","object":"chat.completion.chunk","usage":{"prompt_tokens":14,"completion_tokens":12,"total_tokens":26,"credits_used":200,"credits_remaining":999800}}

data: [DONE]

Some providers also send a usage-only chunk where choices is an empty array ([]) before [DONE]. Client code must not assume choices[0] exists on every SSE event.

The final chunk before [DONE] often contains the usage object with credit information (sometimes together with empty choices).

Responses stream format

The Responses endpoint uses typed events with event: and data: lines:

event: response.created
data: {"id":"resp-abc","object":"response","status":"in_progress","model":"openai/gpt-4.1-mini","output":[]}

event: response.in_progress
data: {"id":"resp-abc","object":"response","status":"in_progress"}

event: response.output_item.added
data: {"type":"message","id":"msg-001","role":"assistant","content":[]}

event: response.content_part.added
data: {"type":"output_text","text":""}

event: response.output_text.delta
data: {"type":"output_text","delta":"Lines"}

event: response.output_text.delta
data: {"type":"output_text","delta":" of code"}

event: response.output_text.done
data: {"type":"output_text","text":"Lines of code flow..."}

event: response.content_part.done
data: {"type":"output_text","text":"Lines of code flow..."}

event: response.output_item.done
data: {"type":"message","id":"msg-001","role":"assistant","content":[...]}

event: response.completed
data: {"id":"resp-abc","object":"response","status":"completed","output":[...],"usage":{"input_tokens":14,"output_tokens":12,"credits_used":200,"credits_remaining":999800}}

data: [DONE]

Each event includes a sequence_number that increases monotonically.

Consuming streams

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    api_key="sk-quantized-YOUR-KEY",
    base_url="https://api.quantized.us/v1",
)

stream = client.chat.completions.create(
    model="openai/gpt-4.1-mini",
    messages=[{"role": "user", "content": "Write a haiku about code."}],
    stream=True,
)

for chunk in stream:
    # Usage-only chunks often have `choices: []`; never index `[0]` blindly.
    if not chunk.choices:
        continue
    delta = chunk.choices[0].delta
    if delta and delta.content:
        print(delta.content, end="", flush=True)
print()

Python (httpx, raw SSE)

import httpx

with httpx.stream(
    "POST",
    "https://api.quantized.us/v1/chat/completions",
    headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
    json={
        "model": "openai/gpt-4.1-mini",
        "messages": [{"role": "user", "content": "Write a haiku about code."}],
        "stream": True,
    },
) as response:
    for line in response.iter_lines():
        if line.startswith("data: ") and line != "data: [DONE]":
            import json
            chunk = json.loads(line[6:])
            choices = chunk.get("choices") or []
            if not choices:
                continue
            delta = choices[0].get("delta") or {}
            content = delta.get("content") or ""
            if content:
                print(content, end="", flush=True)
print()

JavaScript

const response = await fetch('https://api.quantized.us/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer sk-quantized-YOUR-KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'openai/gpt-4.1-mini',
    messages: [{ role: 'user', content: 'Write a haiku about code.' }],
    stream: true,
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const text = decoder.decode(value);
  // Parse SSE lines and extract content deltas
  console.log(text);
}