venice-embeddings

/home/avalon/.hermes/skills/venice/venice-embeddings/SKILL.md · raw

Venice Embeddings

POST /api/v1/embeddings returns vector embeddings for strings. It's OpenAI-compatible: the request and response match https://api.openai.com/v1/embeddings closely enough that the OpenAI SDK works out of the box with baseURL: "https://api.venice.ai/api/v1".

Use when

Text-only. For image/multimodal signals, either run images through a vision chat model and embed the description, or pick a multimodal-capable embedding model from GET /models?type=embedding (the catalog changes; inspect model_spec on each row).

Minimal request

curl https://api.venice.ai/api/v1/embeddings \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept-Encoding: gzip, br" \
  -d '{
    "model": "text-embedding-bge-m3",
    "input": "Why is the sky blue?"
  }'
{
  "object": "list",
  "model": "text-embedding-bge-m3",
  "data": [
    { "object": "embedding", "index": 0, "embedding": [0.0023, -0.0093, 0.0158, ...] }
  ],
  "usage": { "prompt_tokens": 8, "total_tokens": 8 }
}

Request schema

Field Type Notes
model string Required. Model ID from GET /models?type=embedding.
input string | string[] | number[] | number[][] Required. Single string, array of strings (≤ 2048 entries), or pre-tokenized arrays.
encoding_format "float" | "base64" Default "float". Use "base64" for ~4× payload shrinkage; decode client-side.
dimensions integer Optional. Truncate output dimensions. Only meaningful when the model's model_spec.supportsCustomDimensions === true — behavior on non-supporting models is model-dependent; test a small call before relying on it.
user string Accepted for OpenAI compat. Discarded by Venice.

input max tokens per string is capped at the model's model_spec.maxInputTokens (typically 8192). Batch arrays are capped at 2048 items. Venice returns one embedding per element, in order, with matching index.

Response headers & compression

Request Accept-Encoding: gzip, br. The response will include Content-Encoding accordingly. For long batches this matters — vectors are large.

For x402 auth, X-Balance-Remaining reports your remaining USDC credits.

Using the OpenAI SDK

import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: process.env.VENICE_API_KEY,
  baseURL: 'https://api.venice.ai/api/v1',
})

const res = await client.embeddings.create({
  model: 'text-embedding-bge-m3',
  input: ['first doc', 'second doc'],
})

const vec0 = res.data[0].embedding

Batch-embedding pattern

async function embedBatch(texts: string[], batchSize = 64) {
  const out: number[][] = []
  for (let i = 0; i < texts.length; i += batchSize) {
    const slice = texts.slice(i, i + batchSize)
    const res = await client.embeddings.create({
      model: 'text-embedding-bge-m3',
      input: slice,
      encoding_format: 'float',
    })
    for (const row of res.data) out[i + row.index] = row.embedding
  }
  return out
}

Choosing a model

Query GET /models?type=embedding for the current catalog. Each entry exposes:

Built-in options include text-embedding-bge-m3, text-embedding-bge-en-icl, text-embedding-qwen3-8b, text-embedding-qwen3-0-6b, text-embedding-multilingual-e5-large-instruct, text-embedding-3-small, text-embedding-3-large, gemini-embedding-2-preview, text-embedding-nemotron-embed-vl-1b-v2.

Always pin the model ID — cosine distances are not comparable across different embedding models.

Error handling

Code Meaning
400 Validation error. Check details in the response for the exact field.
401 Auth / Pro-only model.
402 Insufficient balance. Bearer → INSUFFICIENT_BALANCE. x402 → structured PAYMENT_REQUIRED.
415 Wrong Content-Type — must be application/json.
429 Rate limited.
500 Inference failed; retry with jitter.
503 Model at capacity; retry later.

Gotchas