--- name: venice-responses description: Use Venice's Alpha POST /responses endpoint - an OpenAI-compatible Responses API with typed output blocks (reasoning, message, function_call, web_search_call). Covers request shape, streaming, differences from /chat/completions, supported venice_parameters subset, and E2EE behavior. --- # Venice Responses API (Alpha) `POST /api/v1/responses` is Venice's OpenAI-compatible Responses endpoint. It returns a **structured, typed output array** instead of a single `message.content` string — ideal for agents that need to separate reasoning, messages, tool calls, and built-in tool events. > **Alpha.** Access is gated behind the `responsesApiEnabled` flag on Bearer API keys (staff-only during beta). x402 wallet auth bypasses this flag — you can pay per request without the flag. Schemas may change. ## Use when - You need the OpenAI Responses-style response shape (`output[]` with typed `type: "reasoning" | "message" | "function_call" | "web_search_call"` blocks) for a client library that expects it. - You want clean separation of reasoning vs message vs tool-call output. - You want streaming via SSE with typed events. Otherwise use [`venice-chat`](../venice-chat/SKILL.md) — it has more features, more models, and full Venice parameters. ## Limitations vs `/chat/completions` | Limitation | Detail | |---|---| | **Stateless** | No conversation persistence across requests. Send the full history each call. | | **E2EE models default to rejection** | E2EE-capable models return `400` unless you pass `venice_parameters.enable_e2ee: false` (TEE-only mode). For end-to-end encrypted inference with E2EE headers, use `/chat/completions`. | | **Subset of `venice_parameters`** | `character_slug`, `enable_e2ee`, `enable_web_search`, `enable_web_scraping`, `enable_web_citations`, `include_venice_system_prompt`, `include_search_results_in_stream` are supported. `strip_thinking_response`, `disable_thinking`, `enable_x_search` are **not** wired through in Alpha. | | **Access gated by feature flag** | Bearer keys without `responsesApiEnabled` get `401`. x402 requests are allowed (pay-per-call). | ## Authentication Same as the rest of the API — either `Authorization: Bearer ` or `X-Sign-In-With-X: `. See [`venice-auth`](../venice-auth/SKILL.md). ## Minimal request ```bash curl https://api.venice.ai/api/v1/responses \ -H "Authorization: Bearer $VENICE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "zai-org-glm-5-1", "input": "Explain why the sky is blue in one paragraph." }' ``` `input` accepts: - a plain string, or - an array of typed input items (similar to `chat/completions` message parts) for multi-turn or multimodal history. ## Response shape ```json { "id": "resp_abc123", "object": "response", "created_at": 1735689600, "model": "zai-org-glm-5-1", "status": "completed", "output": [ { "type": "reasoning", "id": "rs_1", "summary": ["I considered Rayleigh scattering..."], "encrypted_content": "..." }, { "type": "message", "id": "msg_1", "status": "completed", "role": "assistant", "content": [{ "type": "output_text", "text": "The sky is blue because...", "annotations": [{ "type": "url_citation", "url": "https://example.com/rayleigh", "title": "Rayleigh scattering", "start_index": 42, "end_index": 99 }] }] }, { "type": "function_call", "id": "fc_1", "call_id": "call_abc", "name": "get_weather", "arguments": "{\"city\":\"Paris\"}", "status": "completed" }, { "type": "web_search_call", "id": "ws_1", "status": "completed" } ], "usage": { "input_tokens": 20, "input_tokens_details": {"cached_tokens": 0}, "output_tokens": 80, "output_tokens_details": {"reasoning_tokens": 40}, "total_tokens": 100 } } ``` Top-level `status` ∈ `completed` | `failed` | `in_progress` | `cancelled`. On `failed`, `error.code` and `error.message` are populated. ## Output block types | `type` | Purpose | |---|---| | `reasoning` | Thought process from reasoning models. `summary[]` holds human-readable text; `encrypted_content` holds opaque signatures — round-trip verbatim for multi-turn tool calls. | | `message` | Main text output. `content[].type === "output_text"`, plus `annotations[]` for `url_citation` entries from web search. | | `function_call` | Tool call: `name`, stringified-JSON `arguments`, `call_id`. | | `web_search_call` | Sentinel showing the built-in web_search tool fired; use alongside `url_citation` annotations on messages. | Match tool outputs back by `call_id` when continuing the turn. ## Common request fields | Field | Notes | |---|---| | `model` | Required. Model ID, trait, or compatibility mapping. Feature suffixes allowed (see [`venice-chat`](../venice-chat/SKILL.md#model-feature-suffixes)). | | `input` | Required. String or input-items array. To set system/developer context, include a leading message with `role: "system"`/`"developer"` in the input array. | | `tools` | Array of `{type:"function",function:{...}}` or built-in `{type:"web_search"}` — availability depends on the model. | | `tool_choice` | `"auto"` / `"required"` / `"none"` / `{type:"function",function:{"name":"..."}}`. | | `reasoning.effort` | Reasoning effort hint for thinking models (`"low"` \| `"medium"` \| `"high"`). | | `temperature`, `top_p`, `max_output_tokens`, `n`, `stop`, `seed`, `prompt_cache_key` | Standard generation controls — translated to `/chat/completions` equivalents server-side. | | `stream` | Boolean. SSE response with typed events (`response.created`, `response.output_item.added`, `response.output_text.delta`, `response.completed`, …). | | `venice_parameters` | Subset listed above. Example: `{"character_slug":"alan-watts","enable_web_search":"on"}`. | Fields commonly found in OpenAI's Responses API that are **not** in Venice's Alpha schema (and silently ignored or rejected by Zod): `instructions`, `metadata`, `parallel_tool_calls`, `response_format`, `store`, `previous_response_id`, `background`. For `response_format` / JSON-schema structured output, use `/chat/completions`. ## Streaming With `stream: true`, the response is an SSE stream of typed events. Typical flow: ``` event: response.created event: response.output_item.added # type=reasoning event: response.reasoning.delta event: response.output_item.added # type=message event: response.content_part.added event: response.output_text.delta event: response.output_text.delta event: response.output_item.done event: response.completed ``` Consume events in order and reconstruct `output[]` client-side; the shape on `response.completed` matches the non-streamed response exactly. ## Authentication & error responses - `400` — bad request; also returned when an E2EE-capable model is used without `venice_parameters.enable_e2ee: false`. - `401` — auth failed, or Bearer key lacks `responsesApiEnabled`, or the model is Pro-only and you're on an INFERENCE key / x402 wallet. - `402` — insufficient balance. Bearer → `{ error: "INSUFFICIENT_BALANCE" }`. x402 → `PAYMENT_REQUIRED` with `topUpInstructions` and `siwxChallenge` (see [`venice-x402`](../venice-x402/SKILL.md)). - `429` — rate-limited. - `500` — inference failed. `X-Balance-Remaining` is on 200 responses when using x402 auth; `PAYMENT-REQUIRED` header on 402. ## Migration notes - Port `messages` → pass as `input` (string, or typed array with leading `{role:"system"|"developer", content:"..."}`). - `venice_parameters.character_slug` → **supported**; pass inside `venice_parameters` or as a model feature suffix (`:character_slug=alan-watts`). - `venice_parameters.enable_web_search` → pass inside `venice_parameters`, or append `:enable_web_search=on` to the model ID, or add `{"type":"web_search"}` to `tools`. - `venice_parameters.strip_thinking_response` / `disable_thinking` → **not supported on `/responses`** in Alpha; stay on `/chat/completions` for these. - Full E2EE flow (E2EE request headers + encrypted response) → stay on `/chat/completions`. For TEE-only inference on an E2EE-capable model, pass `venice_parameters.enable_e2ee: false` here. - `response_format` / JSON-schema structured output → stay on `/chat/completions`.