/audio/transcriptions)POST /api/v1/audio/transcriptions takes an audio file and returns text. It's OpenAI-compatible with multipart/form-data — the OpenAI SDK's audio.transcriptions.create() works unchanged.
For long video / YouTube transcription, see venice-video's /video/transcriptions (takes a public video URL directly).
curl https://api.venice.ai/api/v1/audio/transcriptions \
-H "Authorization: Bearer $VENICE_API_KEY" \
-F "file=@./meeting.m4a" \
-F "model=nvidia/parakeet-tdt-0.6b-v3" \
-F "response_format=json" \
-F "timestamps=false"
{ "text": "Alright everyone, let's kick off the meeting..." }
With timestamps=true, json format also returns segment/word timings (schema is model-specific).
multipart/form-data)| Field | Type | Default | Notes |
|---|---|---|---|
file |
binary | — | Required. Audio file. Supported: wav, wave, flac, m4a, aac, mp4, mp3, ogg, webm. Base64 is not accepted — upload as a real file. |
model |
enum | nvidia/parakeet-tdt-0.6b-v3 |
See models below. |
response_format |
json / text |
json |
text returns text/plain body. |
timestamps |
bool | false |
Include segment/word timestamps (JSON only). |
language |
string | — | ISO 639-1 hint (e.g. en, ja). Only Whisper-family models honor it; others auto-detect. |
| Model ID | Notes |
|---|---|
nvidia/parakeet-tdt-0.6b-v3 |
Default. Fast, English-first, great for real-time-ish flows. |
openai/whisper-large-v3 |
Large multilingual, honors language hint. |
fal-ai/wizper |
Whisper variant, competitive on quality/latency tradeoff. |
elevenlabs/scribe-v2 |
ElevenLabs Scribe, strong on noisy audio. |
stt-xai-v1 |
xAI Speech-to-Text. |
GET /models?type=asr returns the current catalog. ASR pricing is pricing.per_audio_second.usd — cost scales with audio duration.
import OpenAI from 'openai'
import fs from 'node:fs'
const client = new OpenAI({
apiKey: process.env.VENICE_API_KEY,
baseURL: 'https://api.venice.ai/api/v1',
})
const out = await client.audio.transcriptions.create({
file: fs.createReadStream('meeting.m4a'),
model: 'openai/whisper-large-v3',
response_format: 'json',
language: 'en',
// @ts-expect-error — Venice-specific extra, passes through multipart
timestamps: true,
})
console.log(out.text)
Venice doesn't expose native chunking. For files > ~30 min, split client-side on silence with ffmpeg or pydub, transcribe each chunk, then concatenate with offset timestamps.
ffmpeg -i long.mp3 -f segment -segment_time 600 -c copy chunk_%03d.mp3
| Code | Meaning |
|---|---|
400 |
Bad params, unsupported audio format, empty file, or file larger than 25 MB (this endpoint returns 400 with "Maximum size is 25MB", not 413). |
401 |
Auth / Pro-only. |
402 |
Insufficient balance. |
415 |
Wrong Content-Type — must be multipart/form-data. |
422 |
Validation / upstream ASR error (e.g. zero-length audio, upstream provider 422). Not a "content policy" code on this path. |
429 |
Rate limited. |
500 / 503 |
Transient; retry with jitter. |
file must be uploaded as a real multipart file part. JSON + base64 is not supported here.json, verbose_json, srt, vtt). With response_format: text the handler returns a plain text/plain body containing just the transcript — you'll lose any timestamp data, so pick verbose_json / srt / vtt when you need timings.language is Whisper-specific. Parakeet / Scribe ignore it and auto-detect.429, back off; big batches should throttle to ~5 parallel requests.422 with an error string; it does not surface suggested_prompt on this path.requests + env vars: In execute_code contexts, os.environ.get("VENICE_API_KEY") may return None even when the shell has the variable. Pass the key explicitly or read it from a known file (e.g., .env). A 401 from requests.post(..., files=...) with multipart data often means the key wasn't injected, not that the key is invalid.requests/curl call may exceed a 60 s tool timeout. Use terminal(background=true) with curl writing to a file, then poll for completion, or use requests inside execute_code with a long timeout (e.g., 300–600 s).ffmpeg -i input.mp3 -t <seconds> -c copy part_a.mp3 and -ss <seconds> -c copy part_b.mp3 to split without re-encoding.When local Whisper cannot be installed (no GPU, no disk space, PEP 668 restriction), see references/vps-fallback-transcription.md for a session-tested curl-based workflow using Venice's whisper-large-v3.