venice-models

/home/avalon/.hermes/skills/venice/venice-models/SKILL.md · raw

Venice Models

Three read-only endpoints for model discovery — all GET:

Endpoint Returns
/models Full model catalog with model_spec (capabilities, constraints, pricing).
/models/traits Trait → model ID mapping (e.g. "default", "fastest", "default_reasoning", "highest_quality").
/models/compatibility_mapping Legacy / OpenAI / third-party model ID → Venice model ID aliases.

All three take an optional ?type= filter: text, image, video, music, tts, asr, embedding, upscale, inpaint, all, code.

All three are authenticated (Bearer API key or x402 SIWE) like every other /api/v1 route.

Use when

GET /models

curl "https://api.venice.ai/api/v1/models?type=text"
{
  "object": "list",
  "type": "text",
  "data": [
    {
      "id": "zai-org-glm-5-1",
      "created": 1699000000,
      "model_spec": {
        "name": "GLM 5.1",
        "description": "Balanced blend of speed and capability...",
        "availableContextTokens": 200000,
        "maxCompletionTokens": 24000,
        "privacy": "private",
        "beta": false,
        "betaModel": false,
        "modelSource": "https://huggingface.co/zai-org/GLM-5.1",
        "offline": false,
        "capabilities": { ... },
        "constraints": { ... },
        "pricing": { ... },
        "regionRestrictions": ["US"],
        "deprecation": {"date": "2025-03-01T00:00:00.000Z"}
      }
    }
  ]
}

model_spec.capabilities — text models

Flag Meaning
optimizedForCode Tuned for coding tasks.
quantization fp4 / fp8 / fp16 / bf16 / int8 / int4 / not-available.
supportsFunctionCalling Tools are allowed.
supportsReasoning Emits <thinking>...</thinking> blocks (and/or provider-specific reasoning_content).
supportsReasoningEffort Honors reasoning.effort / reasoning_effort.
supportsResponseSchema Honors response_format: json_schema.
supportsMultipleImages + maxImages Multi-image vision support.
supportsVision Accepts image_url parts.
supportsVideoInput Accepts video_url parts.
supportsWebSearch Honors venice_parameters.enable_web_search.
supportsLogProbs Honors logprobs / top_logprobs.
supportsTeeAttestation Runs inside a TEE with hardware attestation.
supportsE2EE End-to-end encrypted inference available (requires TEE).
supportsXSearch xAI native web + X/Twitter search.
supportsAudioInput Accepts audio-content message parts (set by the runtime capability builder — not part of the OpenAPI strict schema but appears on /models responses).

model_spec.constraints — by model family

model_spec.pricing — by model family

Crypto RPC pricing is not in /models — it's tier × chain multipliers on /crypto/rpc/{network} (see venice-crypto-rpc).

Other top-level model_spec fields

Field Use
privacy (private / anonymized) Zero data retention if private.
beta / betaModel Gated to beta users (beta: true ⇒ need access).
offline Currently unavailable; skip.
regionRestrictions[] Country codes. 403 outside them.
deprecation.date Retirement date. Migrate before.

GET /models/traits

curl "https://api.venice.ai/api/v1/models/traits?type=text"

Returns { object: "list", type: "text", data: { "default": "zai-org-glm-5-1", "fastest": "grok-41-fast", "most_uncensored": "venice-uncensored", "default_reasoning": "...", "default_code": "...", "default_vision": "...", "function_calling_default": "...", "most_intelligent": "..." } } for type=text. For type=image, expect keys like default, fastest, highest_quality, eliza-default. Trait keys come from the internal ApiModelTraits / LLMApiModelTraits / ImageApiModelTraits enums.

Use this to avoid hard-coding model IDs — resolve a trait at boot and cache for the session.

GET /models/compatibility_mapping

curl "https://api.venice.ai/api/v1/models/compatibility_mapping?type=text"

Returns { object: "list", type: "text", data: { "openai-gpt-54-pro": "zai-org-glm-5-1", "claude-opus-4-7": "claude-opus-4-7", "gpt-5-4-pro": "openai-gpt-54-pro", ... } }. Both OpenAI-style IDs (openai-gpt-54-pro) and vendor-style aliases (gpt-5-4-pro) may appear as keys.

Lets an OpenAI-style client call Venice with its native model IDs — Venice substitutes behind the scenes. Useful when porting existing code.

Common patterns

Pick a vision+reasoning model at runtime

const list = await fetch(`${base}/models?type=text`).then(r => r.json())
const match = list.data.find((m: any) =>
  m.model_spec.capabilities.supportsVision &&
  m.model_spec.capabilities.supportsReasoning &&
  !m.model_spec.offline &&
  !m.model_spec.beta
)

Validate an image request before submit

const spec = (await fetch(`${base}/models?type=image`).then(r => r.json()))
  .data.find((m: any) => m.id === myModel)!.model_spec

const { widthHeightDivisor, promptCharacterLimit, aspectRatios } = spec.constraints
if (prompt.length > promptCharacterLimit) throw new Error('prompt too long')
if (width % widthHeightDivisor !== 0) throw new Error('width not divisible')
if (aspectRatios && !aspectRatios.includes(myAspect)) throw new Error('bad aspect')

Estimate LLM cost

const p = spec.pricing
const cost =
  (inputTokens / 1_000_000) * p.input.usd +
  (outputTokens / 1_000_000) * p.output.usd +
  (cachedTokens / 1_000_000) * (p.cache_input?.usd ?? 0)

For extended-context runs, check if inputTokens > p.extended?.context_token_threshold and switch to p.extended.* rates.

?type=code

type=code is a convenience filter returning text models with capabilities.optimizedForCode === true. Same response shape as type=text.

Gotchas