API Reference
Gemini Embeddings

Gemini Embeddings

POST /v1/gemini/embeddings

Native passthrough to Google Gemini’s embedding API. Request shape mirrors models/{model}:embedContent / :batchEmbedContents; response is normalized to a single list-of-embeddings format with a usage block on top.

When to use this endpoint vs /v1/embeddings

Use /v1/embeddings for the unified OpenAI-compatible shape. Use this endpoint when you need Gemini-specific fields — task_type, output_dimensionality truncation, optional document title — that the OpenAI shape doesn’t expose.

Headers

Header	Required	Description
`Authorization`	Yes	`Bearer <api-key-or-jwt>`
`Content-Type`	Yes	`application/json`

Request body

Field	Type	Required	Default	Description
`model`	string	Yes	—	Model id (e.g. `gemini-embedding-001`)
`contents`	array	Yes	—	One or more content objects. Each item produces one embedding
`contents[].parts`	array	Yes	—	At least one `{text: string}` part per content
`task_type`	string	No	null	One of `RETRIEVAL_QUERY`, `RETRIEVAL_DOCUMENT`, `SEMANTIC_SIMILARITY`, `CLASSIFICATION`, `CLUSTERING`, `QUESTION_ANSWERING`, `FACT_VERIFICATION`, `CODE_RETRIEVAL_QUERY`
`output_dimensionality`	integer	No	3072	Vector dimension. Must be `>= 1`. Common values: 768, 3072
`title`	string	No	null	Document title — only meaningful when `task_type` is `RETRIEVAL_DOCUMENT`

Strict validation

The serializer uses extra="forbid". Fields outside the list above are rejected with 422. Notable rejects: OpenAI’s input, dimensions, encoding_format, user; Cohere’s texts, input_type, embedding_types.

Single vs batch routing

The router picks the upstream endpoint based on the cardinality of contents:

1 content → :embedContent ($0.15 per 1M tokens)
N > 1 contents → :batchEmbedContents ($0.075 per 1M tokens, 50% cheaper)

Why this matters

Gemini’s :embedContent endpoint silently concatenates multi-part content into a single string and returns ONE vector for everything — no error, 200 OK, wrong shape. The router defends against this by always dispatching multi-content requests to :batchEmbedContents. The endpoint field in the response confirms which upstream URL was used; treat any unexpected value (e.g. embedContent for a multi-content request) as a router bug.

Examples

cURL — single input

cURL — batch (3 inputs)

Python (httpx)

curl -X POST https://api.quantized.us/v1/gemini/embeddings \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-embedding-001",
    "contents": [
      {"parts": [{"text": "The quick brown fox jumps over the lazy dog."}]}
    ],
    "task_type": "RETRIEVAL_DOCUMENT"
  }'

curl -X POST https://api.quantized.us/v1/gemini/embeddings \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-embedding-001",
    "contents": [
      {"parts": [{"text": "first"}]},
      {"parts": [{"text": "second"}]},
      {"parts": [{"text": "third"}]}
    ],
    "task_type": "RETRIEVAL_DOCUMENT"
  }'

import httpx

resp = httpx.post(
    "https://api.quantized.us/v1/gemini/embeddings",
    headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
    json={
        "model": "gemini-embedding-001",
        "contents": [
            {"parts": [{"text": "What is a fox?"}]}
        ],
        "task_type": "RETRIEVAL_QUERY",
        "output_dimensionality": 768,
    },
)
data = resp.json()
print(data["endpoint"])              # "embedContent"
print(len(data["embeddings"][0]["values"]))  # 768

Response

{
  "model": "gemini-embedding-001",
  "embeddings": [
    {"values": [0.0123, -0.0456, 0.0789, ...]}
  ],
  "endpoint": "embedContent",
  "usage": {
    "credits_used": 12,
    "credits_remaining": 999988
  }
}

Field	Type	Description
`model`	string	Echoed from the request
`embeddings`	array	One entry per input content, in order. Single-input requests return a 1-element list
`embeddings[].values`	array of floats	The vector
`endpoint`	string	Either `"embedContent"` or `"batchEmbedContents"` — the upstream URL the router called
`usage.credits_used`	integer	Micro-credits consumed (billed at the rate corresponding to `endpoint`)
`usage.credits_remaining`	integer or null	Micro-credits remaining (`null` for unlimited licenses)

Why is the response shape different from native Gemini?

Gemini’s :embedContent returns {embedding: {values: [...]}} (singular), while :batchEmbedContents returns {embeddings: [{values: [...]}, ...]} (plural). The router normalizes both to the plural list shape so client code doesn’t branch on cardinality, and surfaces the upstream choice via the endpoint field.

Gemini token estimation

Gemini’s embedding endpoints do not return token counts. Quantized estimates input tokens at ~4 characters per token (floored at 1). This is conservative and rarely under-bills for natural-language input.

Models

Model id	Native dimension	Truncatable to	Public list rate (single / batch)
`gemini-embedding-001`	3072	768	$0.15 / $0.075 per 1M tokens

Filter for Gemini embedding models via GET /v1/models:

gemini_embed_models = [
    m for m in models
    if "gemini_embeddings" in m.get("supported_features", [])
]

Providers

Provider	Slug	Default?
Google Gemini	`gemini`	Yes (and only)

This endpoint is Gemini-only — X-Quantized-Provider is ignored.

Errors

Status	Condition
`401`	Invalid or missing API key
`402`	Insufficient credits
`404`	Unknown model id (catalog gate)
`422`	Validation — missing `contents`, empty `contents`, unsupported field, invalid `task_type` value, `output_dimensionality < 1`
`400`	Upstream Gemini error (e.g. invalid task type combination); sanitized message returned
`503`	Upstream throttling, transient unavailability, or auth failure on the Gemini API key

Out of scope today

The following are not accepted by v1 and may be added in a future release:

Multimodal Gemini embeddings (image, audio, video parts)
Vertex AI endpoint variant (this endpoint talks to generativelanguage.googleapis.com, not the Vertex API)
Concurrent fan-out to :countTokens for precise token counts