venice-video

/home/avalon/.hermes/skills/venice/venice-video/SKILL.md · raw

Venice Video

Video is asynchronous — like audio music. Five endpoints:

Endpoint	Purpose
`POST /video/quote`	Price in USD (no charge, no job).
`POST /video/queue`	Enqueue generation. Returns `queue_id`, charges (reserves) funds.
`POST /video/retrieve`	Poll status or download `video/mp4`.
`POST /video/complete`	Finalize & delete media from Venice storage.
`POST /video/transcriptions`	Sync: transcribe a YouTube URL's audio.

Use when

You need text-to-video, image-to-video, video upscale, video-with-audio, or video transcription.
You can tolerate async execution (single-digit seconds to several minutes depending on model, duration, and queue depth — inspect average_execution_time and execution_duration on /video/retrieve for your job's live estimate).
You want to price a job precisely before committing (/video/quote).

Lifecycle — generation

1. Price with `/video/quote`

curl https://api.venice.ai/api/v1/video/quote \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "wan-2-7-text-to-video",
    "duration": "5s",
    "aspect_ratio": "16:9",
    "resolution": "720p",
    "audio": true
  }'

Response: {"quote": 0.35} USD.

2. Submit with `/video/queue`

curl https://api.venice.ai/api/v1/video/queue \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "wan-2-7-text-to-video",
    "prompt": "Commerce being conducted in the city of Venice, Italy.",
    "negative_prompt": "low resolution, worst quality, defects",
    "duration": "5s",
    "aspect_ratio": "16:9",
    "resolution": "720p",
    "audio": true
  }'

Response: { "model": "...", "queue_id": "uuid", "download_url": "https://..." }.

download_url only appears for VPS-backed models. When present, the retrieve endpoint returns JSON status only — fetch this URL to download. Valid 24 h.

3. Poll with `/video/retrieve`

curl https://api.venice.ai/api/v1/video/retrieve \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"...","queue_id":"..."}' \
  --output out.mp4

Processing: JSON {"status":"PROCESSING","average_execution_time":145000,"execution_duration":53200} (ms).
Completed (non-VPS): binary video/mp4 body.
Completed (VPS-backed): {"status":"COMPLETED", ...} — fetch the download_url from the queue response.
delete_media_on_completion: true auto-deletes after successful retrieve.

4. Finalize with `/video/complete`

curl https://api.venice.ai/api/v1/video/complete \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"...","queue_id":"..."}'

`QueueVideoRequest` fields

Availability depends on the model — check GET /models?type=video.

Field	Type	Notes
`model`	string	Required.
`prompt`	string, ≤ 2500–3500	Required (min length 1). Max length varies per model.
`negative_prompt`	string, ≤ 2500–3500	—
`duration`	enum `2s..30s` or `Auto`	Required. Model-specific subset.
`aspect_ratio`	`1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `9:16`, `16:9`, `21:9`	Some models ignore.
`resolution`	`256p..4k`, or upscale hints `2x` / `4x` / `true_1080p`	Use `upscale_factor` for upscale models.
`upscale_factor`	`1` / `2` / `4`	Only for upscale models. `1` = quality enhancement.
`audio`	bool	Default `true`. Audio-capable models.
`image_url`	URL or `data:` URL	Image-to-video reference frame.
`end_image_url`	URL or data URL	End frame / transition reference.
`audio_url`	URL or data URL	Background music input. WAV/MP3, ≤ 30 s, ≤ 15 MB.
`video_url`	URL or data URL	Video-to-video / upscale input. MP4/MOV/WebM.
`reference_image_urls[]`	array of URLs, ≤ 9	Character / style consistency images.
`elements[]`	array, ≤ 4	Advanced models (e.g. Kling O3 R2V): each has `frontal_image_url`, up to 3 `reference_image_urls`, `video_url`. Reference in prompt as `@Element1`, `@Element2`.
`scene_image_urls[]`	array of URLs, ≤ 4	Advanced scene refs; reference in prompt as `@Image1`, `@Image2`.

Common recipes

Text → video with audio

{
  "model": "wan-2-7-text-to-video",
  "prompt": "A golden retriever chasing a frisbee in slow motion at sunset.",
  "duration": "6s",
  "aspect_ratio": "16:9",
  "resolution": "720p",
  "audio": true
}

Image → video

{
  "model": "<image-to-video model>",
  "prompt": "Camera slowly zooms out, revealing the cityscape.",
  "image_url": "https://example.com/cityscape.jpg",
  "duration": "5s",
  "aspect_ratio": "16:9"
}

Video upscale

{
  "model": "<upscale model>",
  "video_url": "data:video/mp4;base64,...",
  "upscale_factor": 2,
  "duration": "Auto"
}

Multi-element consistency (Kling O3 R2V-style)

{
  "model": "<advanced-model>",
  "prompt": "@Element1 walks toward @Element2 against @Image1.",
  "elements": [
    { "frontal_image_url": "<char1.png>", "reference_image_urls": ["<alt1.png>"] },
    { "frontal_image_url": "<char2.png>" }
  ],
  "scene_image_urls": ["<street-scene.jpg>"]
}

`/video/transcriptions` (sync)

Transcribe a YouTube video URL directly — no queue.

curl https://api.venice.ai/api/v1/video/transcriptions \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://www.youtube.com/watch?v=...","response_format":"json"}'

Response: {"transcript":"...","lang":"en"} (JSON) or plain text/plain body when response_format: text.

For arbitrary audio files, use venice-audio-transcription instead.

Full polling loop

async function waitForVideo(model: string, queueId: string, downloadUrl?: string) {
  while (true) {
    const res = await fetch(`${base}/video/retrieve`, {
      method: 'POST', headers,
      body: JSON.stringify({ model, queue_id: queueId }),
    })
    const ct = res.headers.get('content-type') ?? ''
    if (ct.startsWith('video/')) {
      return Buffer.from(await res.arrayBuffer())
    }
    const body = await res.json()
    if (body.status === 'COMPLETED' && downloadUrl) {
      const v = await fetch(downloadUrl)
      return Buffer.from(await v.arrayBuffer())
    }
    if (body.status !== 'PROCESSING') throw new Error(`unexpected ${body.status}`)
    await new Promise(r => setTimeout(r, 5000))
  }
}

Errors

Code	Meaning
`400`	Bad params (duration/resolution not supported by model, missing required `image_url` for i2v, missing `prompt`, etc.).
`401`	Auth / Pro-only.
`402`	Insufficient balance.
`403`	Model unavailable in your region.
`413`	Request payload too large — shrink images / audio. (Returned from `/video/queue`.)
`422`	Content policy violation. (Returned from `/video/queue`.)
`500`	Inference failed.
`503`	Model at capacity — retry later. On `/video/retrieve`, returned when the queue is backed up.

/video/queue does not document 503 in the spec — upstream capacity issues surface there as 500. Watch for 503 specifically on /video/retrieve.

Venice Studio Web UI (Not Self-Hosted)

Venice Studio is browser-based at venice.ai/studio — it is NOT a local package or self-hosted app. It provides a 6-tab workspace: Image, Edit, Audio, Video, Movie Editor, and Library. The Movie Editor supports timeline-based editing, transitions, audio layers, and title cards that the Venice API does not expose.

Capability	Venice API	Venice Studio Web
Text/image/video/audio generation	✅	✅
Character reference (R2V) via `elements[]`	✅ (Kling O3, etc.)	✅ (Seedance 2.0, Kling, etc.)
Seedance 2.0 R2V (cinematic quality)	✅ Documented queue API; may be absent from `/models?type=video`	✅
Timeline editing / transitions / cuts	❌	✅ Movie Editor
Audio layer mixing / sync	❌	✅ Movie Editor
Final MP4 export	Partial (raw clips)	✅

Implication: The Venice API (and therefore Hermes) is suitable for bulk generation and iteration, including documented Seedance 2.0 queue workflows. Venice Studio remains useful for manual timeline assembly, transitions, trims, audio-layer mixing, and export review. Plan a hybrid workflow only when the requested edit needs those editor-only controls; do not route Seedance away from the API merely because its IDs are missing from the live video catalog.

See references/venice-studio-vs-api.md for a full feature matrix and workflow diagrams.

For the officially linked community Venice Video Harness—its production-routing strengths, the WAN 2.7 I2V insight it surfaced, its contradicted WAN R2V audio claim, evidence ranking, and safe adoption boundaries—read references/venice-video-harness-audit-2026-07.md. Treat Harness routing as expert recipe evidence that must be reconciled with current queue validation and completed-job behavior, not as a live provider registry.

Gotchas

duration is required on /video/queue. Even Auto is a valid explicit value.
download_url is only sometimes returned at queue time. Always handle both paths: binary from /retrieve OR fetching download_url after status COMPLETED.
download_url expires in 24 h — download promptly.
Upscale models use upscale_factor instead of resolution.
reference_image_urls[] is capped at 9 entries, elements[] at 4, scene_image_urls[] at 4. Over-limit is 400.
data: URLs count toward payload size; large base64 videos may trip 413 — prefer hosted URLs.
/video/transcriptions is YouTube-URL-only; it does not accept arbitrary video uploads (use ffmpeg to strip audio, then /audio/transcriptions).
Seedance catalog drift: On 2026-07-21, all six documented standard/fast Seedance 2.0 T2V, I2V, and R2V IDs were quoteable and callable through the API even though they were absent from GET /models?type=video. Reconcile live catalog, current docs, quote validation, and queue validation; catalog absence alone is not proof of unavailability.

Audio parameter model inconsistency

For the tested WAN 2.7 supplied-audio matrix, known-good I2V request shape, exact-master remux procedure, and QA commands, read references/wan27-supplied-audio-validation-2026-07-21.md.

/video/quote is pricing-only and may return 200 even for media-field combinations that the model-specific /video/queue validator rejects. In a live 2026-07-21 test, wan-2-7-reference-to-video advertised per_reference_audio: true, yet queue validation rejected both reference_audio_urls ("This model does not support reference audios") and audio_url ("This model does not support audio input"). Use wan-2-7-image-to-video for genuine supplied-audio conditioning; if you generate with WAN 2.7 R2V and attach the recording afterward, label it as post-generation muxing rather than audio-driven generation.

/video/quote may accept audio: true for a model, but /video/queue can still reject it with:

{"details":{"audio":{"_errors":["This model does not support audio configuration"]}},"issues":[{"code":"custom","message":"This model does not support audio configuration","path":["audio"]}]}

Sora 2 Pro is a known silent-only model — omit audio entirely when queueing to it. Always test-quote first, then if queue fails on audio, retry without the audio field. See references/model-audio-matrix.md for a tested compatibility list.

`/models?type=video` endpoint quirks

GET /models?type=video returns video models but does not populate model_spec.inference_type — the field is null for all entries. You must check model_spec.constraints.durations[] and model_spec.capabilities to distinguish video from text models. The type query parameter works (?type=video gives 92 models, ?type=text gives 75), but the JSON schema lacks the usual inference-type hints.