Gemini Embeddings
POST /v1/gemini/embeddings
Native passthrough to Google Gemini’s embedding API. Request shape mirrors models/{model}:embedContent / :batchEmbedContents; response is normalized to a single list-of-embeddings format with a usage block on top.
/v1/embeddingsUse /v1/embeddings for the unified OpenAI-compatible shape. Use this endpoint when you need Gemini-specific fields — task_type, output_dimensionality truncation, optional document title — that the OpenAI shape doesn’t expose.
Headers
| Header | Required | Description |
|---|---|---|
Authorization |
Yes | Bearer <api-key-or-jwt> |
Content-Type |
Yes | application/json |
Request body
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
model |
string | Yes | — | Model id (e.g. gemini-embedding-001) |
contents |
array | Yes | — | One or more content objects. Each item produces one embedding |
contents[].parts |
array | Yes | — | At least one {text: string} part per content |
task_type |
string | No | null | One of RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING, QUESTION_ANSWERING, FACT_VERIFICATION, CODE_RETRIEVAL_QUERY |
output_dimensionality |
integer | No | 3072 | Vector dimension. Must be >= 1. Common values: 768, 3072 |
title |
string | No | null | Document title — only meaningful when task_type is RETRIEVAL_DOCUMENT |
The serializer uses extra="forbid". Fields outside the list above are rejected with 422. Notable rejects: OpenAI’s input, dimensions, encoding_format, user; Cohere’s texts, input_type, embedding_types.
Single vs batch routing
The router picks the upstream endpoint based on the cardinality of contents:
- 1 content →
:embedContent($0.15 per 1M tokens) - N > 1 contents →
:batchEmbedContents($0.075 per 1M tokens, 50% cheaper)
Gemini’s :embedContent endpoint silently concatenates multi-part content into a single string and returns ONE vector for everything — no error, 200 OK, wrong shape. The router defends against this by always dispatching multi-content requests to :batchEmbedContents. The endpoint field in the response confirms which upstream URL was used; treat any unexpected value (e.g. embedContent for a multi-content request) as a router bug.
Examples
curl -X POST https://api.quantized.us/v1/gemini/embeddings \
-H "Authorization: Bearer sk-quantized-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-embedding-001",
"contents": [
{"parts": [{"text": "The quick brown fox jumps over the lazy dog."}]}
],
"task_type": "RETRIEVAL_DOCUMENT"
}'
curl -X POST https://api.quantized.us/v1/gemini/embeddings \
-H "Authorization: Bearer sk-quantized-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-embedding-001",
"contents": [
{"parts": [{"text": "first"}]},
{"parts": [{"text": "second"}]},
{"parts": [{"text": "third"}]}
],
"task_type": "RETRIEVAL_DOCUMENT"
}'
import httpx
resp = httpx.post(
"https://api.quantized.us/v1/gemini/embeddings",
headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
json={
"model": "gemini-embedding-001",
"contents": [
{"parts": [{"text": "What is a fox?"}]}
],
"task_type": "RETRIEVAL_QUERY",
"output_dimensionality": 768,
},
)
data = resp.json()
print(data["endpoint"]) # "embedContent"
print(len(data["embeddings"][0]["values"])) # 768
Response
{
"model": "gemini-embedding-001",
"embeddings": [
{"values": [0.0123, -0.0456, 0.0789, ...]}
],
"endpoint": "embedContent",
"usage": {
"credits_used": 12,
"credits_remaining": 999988
}
}
| Field | Type | Description |
|---|---|---|
model |
string | Echoed from the request |
embeddings |
array | One entry per input content, in order. Single-input requests return a 1-element list |
embeddings[].values |
array of floats | The vector |
endpoint |
string | Either "embedContent" or "batchEmbedContents" — the upstream URL the router called |
usage.credits_used |
integer | Micro-credits consumed (billed at the rate corresponding to endpoint) |
usage.credits_remaining |
integer or null | Micro-credits remaining (null for unlimited licenses) |
Gemini’s :embedContent returns {embedding: {values: [...]}} (singular), while :batchEmbedContents returns {embeddings: [{values: [...]}, ...]} (plural). The router normalizes both to the plural list shape so client code doesn’t branch on cardinality, and surfaces the upstream choice via the endpoint field.
Gemini’s embedding endpoints do not return token counts. Quantized estimates input tokens at ~4 characters per token (floored at 1). This is conservative and rarely under-bills for natural-language input.
Models
| Model id | Native dimension | Truncatable to | Public list rate (single / batch) |
|---|---|---|---|
gemini-embedding-001 |
3072 | 768 | $0.15 / $0.075 per 1M tokens |
Filter for Gemini embedding models via GET /v1/models:
gemini_embed_models = [
m for m in models
if "gemini_embeddings" in m.get("supported_features", [])
]
Providers
| Provider | Slug | Default? |
|---|---|---|
| Google Gemini | gemini |
Yes (and only) |
This endpoint is Gemini-only — X-Quantized-Provider is ignored.
Errors
| Status | Condition |
|---|---|
401 |
Invalid or missing API key |
402 |
Insufficient credits |
404 |
Unknown model id (catalog gate) |
422 |
Validation — missing contents, empty contents, unsupported field, invalid task_type value, output_dimensionality < 1 |
400 |
Upstream Gemini error (e.g. invalid task type combination); sanitized message returned |
503 |
Upstream throttling, transient unavailability, or auth failure on the Gemini API key |
The following are not accepted by v1 and may be added in a future release:
- Multimodal Gemini embeddings (image, audio, video parts)
- Vertex AI endpoint variant (this endpoint talks to
generativelanguage.googleapis.com, not the Vertex API) - Concurrent fan-out to
:countTokensfor precise token counts