--- name: venice-embeddings description: Call POST /embeddings on Venice. Covers request shape (input, model, encoding_format, dimensions, user), OpenAI compatibility, response compression (gzip/br), and practical usage for retrieval, clustering, and RAG. --- # Venice Embeddings `POST /api/v1/embeddings` returns vector embeddings for strings. It's OpenAI-compatible: the request and response match `https://api.openai.com/v1/embeddings` closely enough that the OpenAI SDK works out of the box with `baseURL: "https://api.venice.ai/api/v1"`. ## Use when - You're building retrieval / RAG / similarity search. - You need text clustering, classification, deduplication, or reranking. - You want Venice's "no-training, no-retention" stance on inference inputs — embeddings are generated and returned; the API does not publish E2EE semantics on `/embeddings` the way it does on selected chat models. Text-only. For image/multimodal signals, either run images through a vision chat model and embed the description, or pick a multimodal-capable embedding model from `GET /models?type=embedding` (the catalog changes; inspect `model_spec` on each row). ## Minimal request ```bash curl https://api.venice.ai/api/v1/embeddings \ -H "Authorization: Bearer $VENICE_API_KEY" \ -H "Content-Type: application/json" \ -H "Accept-Encoding: gzip, br" \ -d '{ "model": "text-embedding-bge-m3", "input": "Why is the sky blue?" }' ``` ```json { "object": "list", "model": "text-embedding-bge-m3", "data": [ { "object": "embedding", "index": 0, "embedding": [0.0023, -0.0093, 0.0158, ...] } ], "usage": { "prompt_tokens": 8, "total_tokens": 8 } } ``` ## Request schema | Field | Type | Notes | |---|---|---| | `model` | string | **Required.** Model ID from `GET /models?type=embedding`. | | `input` | string \| string[] \| number[] \| number[][] | **Required.** Single string, array of strings (≤ 2048 entries), or pre-tokenized arrays. | | `encoding_format` | `"float"` \| `"base64"` | Default `"float"`. Use `"base64"` for ~4× payload shrinkage; decode client-side. | | `dimensions` | integer | Optional. Truncate output dimensions. Only meaningful when the model's `model_spec.supportsCustomDimensions === true` — behavior on non-supporting models is model-dependent; test a small call before relying on it. | | `user` | string | Accepted for OpenAI compat. Discarded by Venice. | `input` max tokens per string is capped at the model's `model_spec.maxInputTokens` (typically 8192). Batch arrays are capped at **2048 items**. Venice returns one embedding per element, in order, with matching `index`. ## Response headers & compression Request `Accept-Encoding: gzip, br`. The response will include `Content-Encoding` accordingly. For long batches this matters — vectors are large. For x402 auth, `X-Balance-Remaining` reports your remaining USDC credits. ## Using the OpenAI SDK ```ts import OpenAI from 'openai' const client = new OpenAI({ apiKey: process.env.VENICE_API_KEY, baseURL: 'https://api.venice.ai/api/v1', }) const res = await client.embeddings.create({ model: 'text-embedding-bge-m3', input: ['first doc', 'second doc'], }) const vec0 = res.data[0].embedding ``` ## Batch-embedding pattern ```ts async function embedBatch(texts: string[], batchSize = 64) { const out: number[][] = [] for (let i = 0; i < texts.length; i += batchSize) { const slice = texts.slice(i, i + batchSize) const res = await client.embeddings.create({ model: 'text-embedding-bge-m3', input: slice, encoding_format: 'float', }) for (const row of res.data) out[i + row.index] = row.embedding } return out } ``` - Keep batches ≤ model context limit total tokens. - On `429`, back off exponentially and halve the batch — see [`venice-errors`](../venice-errors/SKILL.md). ## Choosing a model Query `GET /models?type=embedding` for the current catalog. Each entry exposes: - `model_spec.embeddingDimensions` — native output dimension (e.g. 1024 for BGE-M3). - `model_spec.maxInputTokens` — max tokens per input string. - `model_spec.supportsCustomDimensions` — whether `dimensions` can truncate the output. - `model_spec.pricing.input.usd` / `.diem` — cost per **million** input tokens. Built-in options include `text-embedding-bge-m3`, `text-embedding-bge-en-icl`, `text-embedding-qwen3-8b`, `text-embedding-qwen3-0-6b`, `text-embedding-multilingual-e5-large-instruct`, `text-embedding-3-small`, `text-embedding-3-large`, `gemini-embedding-2-preview`, `text-embedding-nemotron-embed-vl-1b-v2`. Always pin the model ID — cosine distances are **not** comparable across different embedding models. ## Error handling | Code | Meaning | |---|---| | `400` | Validation error. Check `details` in the response for the exact field. | | `401` | Auth / Pro-only model. | | `402` | Insufficient balance. Bearer → `INSUFFICIENT_BALANCE`. x402 → structured `PAYMENT_REQUIRED`. | | `415` | Wrong `Content-Type` — must be `application/json`. | | `429` | Rate limited. | | `500` | Inference failed; retry with jitter. | | `503` | Model at capacity; retry later. | ## Gotchas - `dimensions` is only meaningful when `model_spec.supportsCustomDimensions === true`. Behavior on other models is model-dependent — test with a small request before relying on it. - `input` must not be empty; Venice rejects empty strings with `400`. - Whether the returned vectors are L2-normalized depends on the model — verify with `Math.hypot(...v) ≈ 1` before assuming. - For RAG, store `model` alongside the vector so you can re-embed on upgrade.