Changelog
Embeddings endpoint (May 2026)
Added
POST /v1/embeddings— OpenAI-compatible embeddings endpoint.- Default provider: OpenAI Direct (new provider, slug
openai). - Alternative: OpenRouter via
X-Quantized-Provider: openrouter. OpenRouter passthrough uses the OpenAI rate table for billing because its response does not include cost data. - Supported models:
text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002. - Accepted fields:
model,input(string or list of strings),dimensions,encoding_format(only"float"),user. - Strict validation (
extra="forbid") — unknown fields return422.
- Default provider: OpenAI Direct (new provider, slug
Out of scope (deferred)
- Native Bedrock embeddings (
/v1/aws-bedrock/embeddings) — Titan + Cohere — landing in a follow-up PR. - Token-array input (
list[int]/list[list[int]]). - Multimodal
ContentPart[]input. encoding_format: "base64".output_dtypequantized embeddings.input_type(relevant once Cohere/Gemini providers are wired in).
Multimodal chat completions (April 2026)
Added
- Chat Completions — new content part types:
input_audio— base64 audio withformat(wav/mp3/aiff/aac/ogg/flac/m4a/pcm16/pcm24)video_url— HTTPS URL ordata:video/...;base64,...data URIfile— PDF viafile_data(HTTPS ordata:application/pdf;base64,...) with optionalfilename; alternativelyfile_id
- Model modality validation: chat-completion requests are now validated against the target model’s
input_modalitybefore dispatch. A mismatch (e.g. audio to a text-only model) returns400with a clear message instead of an opaque upstream error.fileparts are exempt — they are handled by OpenRouter’s universal PDF parser across all models.
Provider support
OpenRouter: all four new modalities. Anthropic: text + image only (existing behavior unchanged).
v1.1 — API Lockdown (April 2026)
Strict parameter validation
The API now enforces strict validation on all request parameters. Unknown or unsupported fields are rejected with 422 Unprocessable Entity.
Chat Completions — removed parameters:
top_k, modalities, audio, web_search_options, metadata
Chat Completions — promoted parameters (now supported):
logprobs, top_logprobs, logit_bias, repetition_penalty, user, stream_options
Chat Completions — promoted message fields (now supported):
refusal, reasoning, reasoning_details (assistant role only)
Chat Completions — accepted message fields (not forwarded to providers):
annotations, audio, function_call (accepted from OpenAI SDK responses to avoid 422 in multi-turn conversations)
Chat Completions — removed message fields:
images (in request messages)
Responses API — removed parameters:
store, text, truncation, include, service_tier, background, top_k, metadata, user
Chat Completions — removed response fields:
system_fingerprint, choices[].message.annotations, choices[].message.audio, choices[].message.images
Chat Completions — promoted response fields (now supported):
choices[].logprobs, choices[].message.reasoning, choices[].message.reasoning_details
Strict content part validation
Message content arrays now only accept text and image_url content parts. Other types like input_audio, video_url, or file are rejected.
Strict tool definition validation
Chat completions tools field now validates tool structure: each tool must follow the OpenAI format (type: "function", function: {name, description, parameters}) or Anthropic format (name, input_schema).
Parameter range validation
All generation parameters are validated at the API boundary with clear error messages:
max_tokens/max_completion_tokens/max_output_tokens: minimum 1temperature: 0–2top_p: 0–1frequency_penalty/presence_penalty: −2 to 2
Out-of-range values return 422 with a message like: "Input should be greater than or equal to 16".
Type improvements
response_formatvalidates structure:{"type": "json_object"}or{"type": "json_schema", ...}reasoningvalidates effort:{"effort": "none" | "low" | "medium" | "high"}with optionalexcludebooleantool_choicevalidates values:"auto","required","none", or{"type": "function", "function": {"name": "..."}}urlsin/v1/fetchnow requireslist[string]
Error message sanitization
Provider error messages are now sanitized before reaching the user. URLs, email addresses, and provider names are stripped from all client-facing error messages. Internal error details are preserved in the database for debugging.
v1 — Initial Release
Endpoints
POST /v1/chat/completions— OpenAI-compatible chat completionsPOST /v1/responses— Stateful Responses APIPOST /v1/web-search— Web search with structured resultsPOST /v1/fetch— Extract text content from URLsGET /v1/models— List available models and pricingGET /v1/license— Check license info and credit balance
Providers
- OpenRouter — LLMs (default for chat completions, responses, models)
- Anthropic — Claude models directly
- Exa — Web search and content fetch (default)
- Tavily — Web search and content fetch (alternative)
Features
- OpenAI SDK compatibility
- SSE streaming for chat completions and responses
- Unified credit billing across all providers
- JWT authentication with auto-provisioning
- Per-institution configuration
- Provider routing via
X-Quantized-Providerheader