--- name: venice-video description: Generate and transcribe videos via Venice. Covers the async /video/quote + /video/queue + /video/retrieve + /video/complete loop, text-to-video, image-to-video, video-to-video (upscale), audio input, reference images, scene and element support, plus /video/transcriptions for YouTube URLs. --- # Venice Video Video is **asynchronous** — like audio music. Five endpoints: | Endpoint | Purpose | |---|---| | `POST /video/quote` | Price in USD (no charge, no job). | | `POST /video/queue` | Enqueue generation. Returns `queue_id`, charges (reserves) funds. | | `POST /video/retrieve` | Poll status or download `video/mp4`. | | `POST /video/complete` | Finalize & delete media from Venice storage. | | `POST /video/transcriptions` | Sync: transcribe a YouTube URL's audio. | ## Use when - You need text-to-video, image-to-video, video upscale, video-with-audio, or video transcription. - You can tolerate async execution (single-digit seconds to several minutes depending on model, duration, and queue depth — inspect `average_execution_time` and `execution_duration` on `/video/retrieve` for your job's live estimate). - You want to price a job precisely before committing (`/video/quote`). ## Lifecycle — generation ### 1. Price with `/video/quote` ```bash curl https://api.venice.ai/api/v1/video/quote \ -H "Authorization: Bearer $VENICE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "wan-2-7-text-to-video", "duration": "5s", "aspect_ratio": "16:9", "resolution": "720p", "audio": true }' ``` Response: `{"quote": 0.35}` USD. ### 2. Submit with `/video/queue` ```bash curl https://api.venice.ai/api/v1/video/queue \ -H "Authorization: Bearer $VENICE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "wan-2-7-text-to-video", "prompt": "Commerce being conducted in the city of Venice, Italy.", "negative_prompt": "low resolution, worst quality, defects", "duration": "5s", "aspect_ratio": "16:9", "resolution": "720p", "audio": true }' ``` Response: `{ "model": "...", "queue_id": "uuid", "download_url": "https://..." }`. - `download_url` only appears for **VPS-backed** models. When present, the retrieve endpoint returns JSON status only — fetch this URL to download. Valid 24 h. ### 3. Poll with `/video/retrieve` ```bash curl https://api.venice.ai/api/v1/video/retrieve \ -H "Authorization: Bearer $VENICE_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"...","queue_id":"..."}' \ --output out.mp4 ``` - Processing: JSON `{"status":"PROCESSING","average_execution_time":145000,"execution_duration":53200}` (ms). - Completed (non-VPS): binary `video/mp4` body. - Completed (VPS-backed): `{"status":"COMPLETED", ...}` — fetch the `download_url` from the queue response. - `delete_media_on_completion: true` auto-deletes after successful retrieve. ### 4. Finalize with `/video/complete` ```bash curl https://api.venice.ai/api/v1/video/complete \ -H "Authorization: Bearer $VENICE_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"...","queue_id":"..."}' ``` ## `QueueVideoRequest` fields Availability depends on the model — check `GET /models?type=video`. | Field | Type | Notes | |---|---|---| | `model` | string | Required. | | `prompt` | string, ≤ 2500–3500 | **Required** (min length 1). Max length varies per model. | | `negative_prompt` | string, ≤ 2500–3500 | — | | `duration` | enum `2s..30s` or `Auto` | Required. Model-specific subset. | | `aspect_ratio` | `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `9:16`, `16:9`, `21:9` | Some models ignore. | | `resolution` | `256p..4k`, or upscale hints `2x` / `4x` / `true_1080p` | Use `upscale_factor` for upscale models. | | `upscale_factor` | `1` / `2` / `4` | Only for upscale models. `1` = quality enhancement. | | `audio` | bool | Default `true`. Audio-capable models. | | `image_url` | URL or `data:` URL | Image-to-video reference frame. | | `end_image_url` | URL or data URL | End frame / transition reference. | | `audio_url` | URL or data URL | Background music input. WAV/MP3, ≤ 30 s, ≤ 15 MB. | | `video_url` | URL or data URL | Video-to-video / upscale input. MP4/MOV/WebM. | | `reference_image_urls[]` | array of URLs, ≤ 9 | Character / style consistency images. | | `elements[]` | array, ≤ 4 | Advanced models (e.g. Kling O3 R2V): each has `frontal_image_url`, up to 3 `reference_image_urls`, `video_url`. Reference in prompt as `@Element1`, `@Element2`. | | `scene_image_urls[]` | array of URLs, ≤ 4 | Advanced scene refs; reference in prompt as `@Image1`, `@Image2`. | ## Common recipes ### Text → video with audio ```json { "model": "wan-2-7-text-to-video", "prompt": "A golden retriever chasing a frisbee in slow motion at sunset.", "duration": "6s", "aspect_ratio": "16:9", "resolution": "720p", "audio": true } ``` ### Image → video ```json { "model": "", "prompt": "Camera slowly zooms out, revealing the cityscape.", "image_url": "https://example.com/cityscape.jpg", "duration": "5s", "aspect_ratio": "16:9" } ``` ### Video upscale ```json { "model": "", "video_url": "data:video/mp4;base64,...", "upscale_factor": 2, "duration": "Auto" } ``` ### Multi-element consistency (Kling O3 R2V-style) ```json { "model": "", "prompt": "@Element1 walks toward @Element2 against @Image1.", "elements": [ { "frontal_image_url": "", "reference_image_urls": [""] }, { "frontal_image_url": "" } ], "scene_image_urls": [""] } ``` ## `/video/transcriptions` (sync) Transcribe a YouTube video URL directly — no queue. ```bash curl https://api.venice.ai/api/v1/video/transcriptions \ -H "Authorization: Bearer $VENICE_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url":"https://www.youtube.com/watch?v=...","response_format":"json"}' ``` Response: `{"transcript":"...","lang":"en"}` (JSON) or plain `text/plain` body when `response_format: text`. For arbitrary audio files, use [`venice-audio-transcription`](../venice-audio-transcription/SKILL.md) instead. ## Full polling loop ```ts async function waitForVideo(model: string, queueId: string, downloadUrl?: string) { while (true) { const res = await fetch(`${base}/video/retrieve`, { method: 'POST', headers, body: JSON.stringify({ model, queue_id: queueId }), }) const ct = res.headers.get('content-type') ?? '' if (ct.startsWith('video/')) { return Buffer.from(await res.arrayBuffer()) } const body = await res.json() if (body.status === 'COMPLETED' && downloadUrl) { const v = await fetch(downloadUrl) return Buffer.from(await v.arrayBuffer()) } if (body.status !== 'PROCESSING') throw new Error(`unexpected ${body.status}`) await new Promise(r => setTimeout(r, 5000)) } } ``` ## Errors | Code | Meaning | |---|---| | `400` | Bad params (duration/resolution not supported by model, missing required `image_url` for i2v, missing `prompt`, etc.). | | `401` | Auth / Pro-only. | | `402` | Insufficient balance. | | `403` | Model unavailable in your region. | | `413` | Request payload too large — shrink images / audio. (Returned from `/video/queue`.) | | `422` | Content policy violation. (Returned from `/video/queue`.) | | `500` | Inference failed. | | `503` | Model at capacity — retry later. **On `/video/retrieve`**, returned when the queue is backed up. | `/video/queue` does not document `503` in the spec — upstream capacity issues surface there as `500`. Watch for `503` specifically on `/video/retrieve`. ## Venice Studio Web UI (Not Self-Hosted) Venice Studio is **browser-based at `venice.ai/studio`** — it is NOT a local package or self-hosted app. It provides a 6-tab workspace: Image, Edit, Audio, Video, Movie Editor, and Library. The Movie Editor supports timeline-based editing, transitions, audio layers, and title cards that the Venice API does **not** expose. | Capability | Venice API | Venice Studio Web | |---|---|---| | Text/image/video/audio generation | ✅ | ✅ | | Character reference (R2V) via `elements[]` | ✅ (Kling O3, etc.) | ✅ (Seedance 2.0, Kling, etc.) | | Seedance 2.0 R2V (cinematic quality) | ❌ Not available via API | ✅ | | Timeline editing / transitions / cuts | ❌ | ✅ Movie Editor | | Audio layer mixing / sync | ❌ | ✅ Movie Editor | | Final MP4 export | Partial (raw clips) | ✅ | **Implication:** The Venice API (and therefore Hermes) is ideal for **bulk generation and iteration** — treatments, character refs, shot lists, raw clips, music tracks. The Venice Studio web UI is required for **final assembly** — timeline editing, transitions, audio sync, and Seedance 2.0 cinematic shots. Plan a **hybrid workflow**: generate assets via API/Hermes, then assemble and polish in the Studio browser UI. See [`references/venice-studio-vs-api.md`](references/venice-studio-vs-api.md) for a full feature matrix and workflow diagrams. ## Gotchas - **`duration` is required on `/video/queue`.** Even `Auto` is a valid explicit value. - `download_url` is **only sometimes** returned at queue time. Always handle both paths: binary from `/retrieve` OR fetching `download_url` after status `COMPLETED`. - `download_url` expires in 24 h — download promptly. - Upscale models use `upscale_factor` *instead of* `resolution`. - `reference_image_urls[]` is capped at 9 entries, `elements[]` at 4, `scene_image_urls[]` at 4. Over-limit is `400`. - `data:` URLs count toward payload size; large base64 videos may trip `413` — prefer hosted URLs. - `/video/transcriptions` is YouTube-URL-only; it does not accept arbitrary video uploads (use ffmpeg to strip audio, then `/audio/transcriptions`). - **Seedance 2.0 is UI-only as of mid-2025.** If a user asks for "the best cinematic quality" or "Seedance," route them to `venice.ai/studio` — the API cannot reach it yet. ### Audio parameter model inconsistency `/video/quote` may accept `audio: true` for a model, but `/video/queue` can still reject it with: ```json {"details":{"audio":{"_errors":["This model does not support audio configuration"]}},"issues":[{"code":"custom","message":"This model does not support audio configuration","path":["audio"]}]} ``` **Sora 2 Pro** is a known silent-only model — omit `audio` entirely when queueing to it. Always test-quote first, then if queue fails on audio, retry without the `audio` field. See [`references/model-audio-matrix.md`](references/model-audio-matrix.md) for a tested compatibility list. ### `/models?type=video` endpoint quirks `GET /models?type=video` returns video models but **does not populate** `model_spec.inference_type` — the field is `null` for all entries. You must check `model_spec.constraints.durations[]` and `model_spec.capabilities` to distinguish video from text models. The `type` query parameter works (`?type=video` gives 92 models, `?type=text` gives 75), but the JSON schema lacks the usual inference-type hints.