---
name: venice-chat
description: Call POST /chat/completions on Venice. Covers the OpenAI-compatible request shape, Venice-only venice_parameters (web search, E2EE, characters, thinking control, X search), multimodal inputs (images/audio/video), tool calls, reasoning controls, streaming, prompt caching, structured output, and model feature suffixes.
---
# Venice Chat Completions
`POST /api/v1/chat/completions` is Venice's main text endpoint. It's OpenAI-compatible, plus a `venice_parameters` object for Venice-only features.
## Use when
- You need LLM text generation, with or without tools, with or without streaming.
- You want multimodal inputs (images, audio, video) to a vision/audio-capable model.
- You want Venice-specific features: web search, E2EE, characters, xAI X/Twitter search, strip-thinking, web scraping.
- You need prompt caching for large system prompts or long documents.
- You need structured (`json_schema`) output.
For the newer Alpha **Responses API**, see [`venice-responses`](../venice-responses/SKILL.md).
## Minimal request
```bash
curl https://api.venice.ai/api/v1/chat/completions \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "zai-org-glm-5-1",
"messages": [{"role": "user", "content": "Why is the sky blue?"}]
}'
```
Response shape is the standard OpenAI `chat.completion` object (`id`, `object: "chat.completion"`, `choices[].message`, `usage`). With `stream: true`, responses come as SSE `data:` lines in `chat.completion.chunk` format.
## The request body
### Core fields (OpenAI-compatible)
| Field | Notes |
|---|---|
| `model` | string — model ID, trait name, or compatibility mapping. Suffixes allowed (see below). Required. |
| `messages` | array of `system` / `developer` / `user` / `assistant` / `tool` messages. Required, min 1. |
| `temperature`, `top_p`, `top_k`, `min_p`, `min_temp`, `max_temp` | sampling controls |
| `repetition_penalty`, `frequency_penalty`, `presence_penalty` | repetition controls |
| `max_tokens` *(deprecated)* / `max_completion_tokens` | upper bound on output tokens |
| `n` | number of choices (keep `1` to minimize cost) |
| `seed` | integer for reproducibility |
| `stop` / `stop_token_ids` | up to 4 strings, or raw token IDs |
| `stream`, `stream_options.include_usage` | SSE streaming + include usage in the final chunk |
| `response_format` | `{type:"json_schema", json_schema:{...}}` (preferred), `{type:"json_object"}`, or `{type:"text"}` |
| `tools`, `tool_choice`, `parallel_tool_calls` | function calling / built-in tools |
| `logprobs`, `top_logprobs` | return token log-probabilities |
| `reasoning.effort` / `reasoning_effort` | `none` \| `minimal` \| `low` \| `medium` \| `high` \| `xhigh` \| `max` |
| `reasoning.summary` | `auto` \| `concise` \| `detailed` |
| `prompt_cache_key`, `prompt_cache_retention` (`default`/`extended`/`24h`) | prompt caching hints |
| `text.verbosity` | `low`/`medium`/`high`/`auto` |
| `metadata` | key/value strings for tracking |
| `user`, `store` | accepted but ignored (OpenAI compat) |
### `venice_parameters` (Venice-only)
All optional. Combined with model feature suffixes, these are how you enable Venice features.
| Field | Type | Default | Effect |
|---|---|---|---|
| `character_slug` | string | — | Apply a published Venice character. Slug is the "Public ID" on the character page. See [`venice-characters`](../venice-characters/SKILL.md). |
| `strip_thinking_response` | bool | `false` | Strip `...` from the assistant output on reasoning models. |
| `disable_thinking` | bool | `false` | Disable thinking entirely on supported reasoning models and strip tags. |
| `enable_e2ee` | bool | `true` | End-to-end encryption on E2EE-capable models when E2EE headers are present. Set to `false` to force TEE-only. |
| `enable_web_search` | `"off"`/`"auto"`/`"on"` | `"off"` | Venice server-side web search. Citations arrive in the first streamed chunk or the response. |
| `enable_web_scraping` | bool | `false` | Scrape any URLs found in the last user message (Firecrawl). |
| `enable_web_citations` | bool | `false` | Ask the LLM to cite sources with `^1^` / `^1,3^` superscripts. |
| `include_search_results_in_stream` | bool | `false` | Experimental — emit search results as the first stream chunk. |
| `return_search_results_as_documents` | bool | — | Also surface search results as a synthetic tool call `venice_web_search_documents` (LangChain-friendly). |
| `include_venice_system_prompt` | bool | `true` | Prepend Venice's curated system prompt. Turn off for full control. |
| `enable_x_search` | bool | `false` | xAI native web + X/Twitter search (Grok models with `supportsXSearch`). Adds ~$0.01/search. |
### Model feature suffixes
Some `venice_parameters` can also be expressed as **model feature suffixes** on the `model` string — useful when the caller/library (OpenAI SDK, LangChain) can't set `venice_parameters`. Syntax:
```
:=[&=…]
```
Values are URL-decoded. Supported keys (exact match):
| Key | Type | Maps to |
|---|---|---|
| `enable_web_search` | `on` / `off` / `auto` | `venice_parameters.enable_web_search` |
| `enable_web_citations` | `"true"` / `"false"` | `venice_parameters.enable_web_citations` |
| `enable_web_scraping` | `"true"` / `"false"` | `venice_parameters.enable_web_scraping` |
| `include_venice_system_prompt` | `"true"` / `"false"` | `venice_parameters.include_venice_system_prompt` |
| `include_search_results_in_stream` | `"true"` / `"false"` | `venice_parameters.include_search_results_in_stream` |
| `return_search_results_as_documents` | `"true"` / `"false"` | `venice_parameters.return_search_results_as_documents` |
| `character_slug` | string | `venice_parameters.character_slug` |
| `strip_thinking_response` | `"true"` / `"false"` | `venice_parameters.strip_thinking_response` |
| `disable_thinking` | `"true"` / `"false"` | `venice_parameters.disable_thinking` |
Unknown keys are silently ignored. Examples:
```
zai-org-glm-5-1:enable_web_search=on
kimi-k2-6:strip_thinking_response=true&enable_web_search=auto
zai-org-glm-5-1:character_slug=alan-watts
```
Note: `enable_e2ee` and `enable_x_search` can **only** be set via `venice_parameters`, not as suffixes.
## Messages and modalities
`messages[].content` is either a string or an array of typed parts. Roles: `user`, `assistant`, `tool`, `system`, `developer` (reasoning models like o-series / codex).
### Text + image (`image_url`)
```json
{
"model": "zai-org-glm-5-1",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
]
}]
}
```
- `url` accepts a public URL **or** `data:image/png;base64,...`.
- Models with `model_spec.capabilities.supportsMultipleImages: true` preserve images across the whole conversation; single-image vision models only keep images from the **last** user message. Check `model_spec.capabilities.maxImages` for the per-request cap.
### Audio input (`input_audio`)
```json
{
"role": "user",
"content": [
{"type": "text", "text": "Transcribe this clip."},
{"type": "input_audio", "input_audio": {"data": "", "format": "wav"}}
]
}
```
Formats: `wav`, `mp3`, `aiff`, `aac`, `ogg`, `flac`, `m4a`, `pcm16`, `pcm24`. Audio URLs are **not** supported — always inline base64.
### Video input (`video_url`)
```json
{
"role": "user",
"content": [
{"type": "text", "text": "Summarize this."},
{"type": "video_url", "video_url": {"url": "https://www.youtube.com/watch?v=..."}}
]
}
```
Accepts public URLs (including YouTube for some providers) or `data:video/mp4;base64,...`. Supported formats: `mp4`, `mpeg`, `mov`, `webm`.
### Prompt caching (`cache_control`)
Any text / image_url / input_audio / video_url part can carry:
```json
{"cache_control": {"type": "ephemeral", "ttl": "1h"}}
```
Combine with `prompt_cache_key` and `prompt_cache_retention: "24h"` on the root request for predictable cache routing. Cache read / write pricing is model-specific — check `model_spec.pricing` on `/models`.
## Tools & function calling
### Function tools
```json
{
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
},
"strict": true
}
}],
"tool_choice": "auto"
}
```
- `tool_choice` can also be `"required"`, `"none"`, or `{"type":"function","function":{"name":"get_weather"}}`.
- `parallel_tool_calls: true` (default) lets the model emit multiple calls at once.
- Respond by appending `{"role":"tool","tool_call_id":"...","content":"..."}` before the next call.
### Built-in tools
```json
"tools": [{"type": "web_search"}, {"type": "x_search"}]
```
Equivalent to toggling `venice_parameters.enable_web_search` / `enable_x_search`. `x_search` requires a model with `supportsXSearch`.
## Reasoning models
On thinking models (GLM 5.1, Kimi K2.6, Claude Opus 4.7, GPT-5.4 Pro, …):
```json
{
"model": "zai-org-glm-5-1",
"reasoning": {"effort": "medium", "summary": "auto"},
"venice_parameters": {"strip_thinking_response": false},
"messages": [...]
}
```
- `reasoning_effort` is the OpenAI-compatible flat variant (takes precedence over `reasoning.effort`).
- Reasoning models may return `reasoning_content` or structured `reasoning_details[]` on the assistant message. **Pass `reasoning_details` back verbatim** in the next turn — it encodes thought signatures for providers like Claude Opus 4.7 and GPT-5.4 Pro.
- Use `venice_parameters.disable_thinking: true` to skip thinking entirely on supported models.
## Structured output (`response_format`)
```json
{
"response_format": {
"type": "json_schema",
"json_schema": {
"type": "object",
"properties": {"name": {"type": "string"}, "age": {"type": "number"}},
"required": ["name", "age"]
}
}
}
```
Prefer `json_schema` over the legacy `json_object`. Plain text is the default (`{"type": "text"}`).
## E2EE (end-to-end encryption)
For models advertising `supportsE2EE`:
1. Perform an HPKE / Noise handshake with Venice (see docs.venice.ai/e2ee).
2. Send encrypted payload with the required E2EE request headers.
3. Leave `venice_parameters.enable_e2ee` at default `true`, or set `false` to fall back to TEE-only.
E2EE is **not** supported on `/responses` — use `/chat/completions` for encrypted inference.
## Streaming
```json
{"stream": true, "stream_options": {"include_usage": true}}
```
- Response is `text/event-stream`. Each event is `data: {...chunk...}\n\n`, terminated by `data: [DONE]`.
- `include_usage: true` adds a final chunk with token counts.
- With `venice_parameters.include_search_results_in_stream: true`, the **first** chunk carries `venice_search_results`.
## Web-search answers
When `enable_web_search` is `"auto"` or `"on"`, the response includes `venice_parameters.web_search_citations[]` where each entry has `url`, `title`, `content` (snippet), and `date`. Turn on `enable_web_citations` to have the model insert `^1^` superscripts inline.
## Error handling specifics
- `402` — insufficient balance. Bearer: `INSUFFICIENT_BALANCE`. x402: `PAYMENT_REQUIRED` with structured `topUpInstructions` and `siwxChallenge`.
- `422` — prompt violates Venice or provider content policy. May include `suggested_prompt`.
- `413` — payload too large (mostly vision/audio).
- `429` — rate limit. See `/api_keys/rate_limits` and [`venice-errors`](../venice-errors/SKILL.md).
## Common gotchas
- `max_tokens` is deprecated — prefer `max_completion_tokens`.
- Image URLs must be **publicly reachable** from Venice's network. Localhost / signed S3 URLs without public access fail.
- Audio inputs cannot be URLs — always base64.
- Single-image vision models drop older images on each turn; chain them into the **last** user message.
- For multi-turn with tools on Claude Opus 4.7, GPT-5.4 Pro, and similar, always round-trip `reasoning_details` unchanged.
- `parallel_tool_calls: true` means you MUST be prepared to execute several tools in parallel before sending a single `tool`-role reply chain.
- `character_slug` **replaces** the default Venice system prompt. Combine with `include_venice_system_prompt: false` for total control.