Patterns for building API integrations that automatically fall back between providers when the primary fails.
See references/openai-subscription-vs-api-key-audio.md for a concrete case where subscription/device auth powered text chat but voice transcription still used an API-key capability path and needed sanitized quota handling.
OpenRouter is NOT Anthropic-compatible — it uses the OpenAI chat completions format (/v1/chat/completions), NOT Anthropic's messages API. You CANNOT just point the Anthropic SDK at OpenRouter's base URL — it will 404.
OpenRouter model IDs differ from Anthropic — e.g., claude-sonnet-4-20250514 on Anthropic is anthropic/claude-sonnet-4.6 on OpenRouter. Use the OpenRouter models endpoint to find correct IDs: curl -s https://openrouter.ai/api/v1/models | python3 -c "import sys,json; [print(m['id']) for m in json.load(sys.stdin)['data'] if 'sonnet' in m['id']]"
Pass through max_tokens — If the OpenRouter fallback hardcodes max_tokens: 4096 but the caller requested 8192, structured JSON responses get truncated. Always use params.max_tokens || 8192 in the fallback.
OpenRouter app headers matter — Include HTTP-Referer and X-Title on OpenRouter requests. Some app deployments work without them during ad hoc curls but fail or lose attribution/routing in production. For Alex's VPS apps, set HTTP-Referer to the public app URL and X-Title to the app name.
Fallback on ANY error, not just 429 — Anthropic/OpenAI/Venice keys can fail with 401 (invalid/expired), 402 (payment/credits), 429 (rate limit), 500 (server error), etc. If you have a fallback available, use it for all errors.
import Anthropic from '@anthropic-ai/sdk'
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY })
async function callClaude(params) {
try {
return await anthropic.messages.create(params)
} catch (err) {
if (process.env.OPENROUTER_API_KEY) {
console.log(`Anthropic error (${err?.status}). Falling back to OpenRouter...`)
return await callOpenRouter(params)
}
throw err
}
}
// OpenRouter uses OpenAI format — must convert manually
async function callOpenRouter(params) {
const messages = []
if (params.system) messages.push({ role: 'system', content: params.system })
for (const msg of params.messages) messages.push({ role: msg.role, content: msg.content })
const res = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`,
'Content-Type': 'application/json',
'HTTP-Referer': 'https://your-app.com',
},
body: JSON.stringify({
model: 'anthropic/claude-sonnet-4.6', // OpenRouter model ID format
messages,
max_tokens: params.max_tokens || 8192,
})
})
if (!res.ok) throw new Error(`OpenRouter error ${res.status}: ${await res.text()}`)
const data = await res.json()
// Normalize to Anthropic response shape
return { content: [{ type: 'text', text: data.choices?.[0]?.message?.content || '' }] }
}
{"status":"quota_exceeded"} with HTTP 401, not a rate limit.| Voice | Character |
|---|---|
| onyx | Deep authoritative male (narrator) |
| echo | Younger male |
| fable | British, older feel |
| nova | Young energetic female |
| shimmer | Mature female |
| alloy | Neutral, slightly older female |
const res = await fetch('https://api.openai.com/v1/audio/speech', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'tts-1-hd',
input: text,
voice: 'onyx', // or echo, fable, nova, shimmer, alloy
response_format: 'mp3',
})
})
const buffer = Buffer.from(await res.arrayBuffer())
fs.writeFileSync(outputPath, buffer)
For routes that stream LLM output to the browser (Server-Sent Events, chunked responses), do not rely on normal Express error middleware after headers have been sent. Once the response is streaming, an upstream provider failure can otherwise leave the UI blank or waiting forever.
Server pattern:
try {
await streamProviderOutput(req, res)
} catch (error) {
if (res.headersSent) {
const msg = (error.message || 'Generation failed').replace(/\n/g, ' ')
res.write(`data: PROVIDER_ERROR:${msg}\n\n`)
res.write('data: END_OF_ALL_MESSAGES\n\n')
return res.end()
}
next(error)
}
Frontend pattern:
const source = new EventSource(url)
source.onmessage = (event) => {
if (event.data.startsWith('PROVIDER_ERROR:')) {
showUserVisibleError(event.data.replace('PROVIDER_ERROR:', '').trim())
source.close()
return
}
if (event.data === 'END_OF_ALL_MESSAGES') source.close()
}
source.onerror = () => {
showUserVisibleError('The stream disconnected. Please try again.')
source.close()
}
Build stream URLs with URLSearchParams; raw interpolation breaks names/locations/prompts containing spaces, ampersands, or slashes.
When a pipeline processes items sequentially and can fail mid-batch (e.g., generating audio for 40 segments), make it resumable:
for (const seg of segments) {
// Skip if this segment already has valid output
if (fs.existsSync(seg.outputPath) && fs.statSync(seg.outputPath).size > 1000) {
console.log(`Skipping segment ${seg.id} (already exists)`)
skipped++
continue
}
// Generate only what's missing
const result = await generateSegment(seg)
generated++
}
This prevents wasting money re-generating items that succeeded before the failure.
Critical pitfall: PM2 does NOT inherit the working directory of the shell that starts it. If you run pm2 start server/index.js from a different directory, dotenv/config will look for .env in the wrong place.
Fix: Always specify --cwd:
pm2 start /path/to/server/index.js --name app-name --cwd /path/to/app
Verify with: pm2 show app-name | grep "exec cwd"
The Replicate Node.js SDK (replicate.run()) returns a FileOutput object, NOT a string URL. Calling .url() on it returns a URL object, NOT a string. SQLite and other storage that expects strings will throw: SQLite3 can only bind numbers, strings, bigints, buffers, and null.
const output = await replicate.run('black-forest-labs/flux-2-pro', { input })
// output is a FileOutput (ReadableStream with .url() method)
if (typeof output === 'string') return output
if (output?.url) {
const url = output.url()
return typeof url === 'string' ? url : url.href || String(url)
}
return String(output)
Always extract .href from the URL object to get a plain string before storing in a database.
For bulk API operations (e.g., generating 60 images), respond immediately and process in background. The frontend polls a progress endpoint:
// Server: respond immediately, process in background
app.post('/api/projects/:id/generate-all', async (req, res) => {
res.json({ message: 'Generating...', total: items.length })
// Background IIFE
;(async () => {
for (const item of items) {
try { await generateItem(item) }
catch (err) { console.error(`Failed ${item.id}:`, err.message) }
}
})()
})
// Progress endpoint
app.get('/api/projects/:id/progress', (req, res) => {
const done = db.prepare('SELECT COUNT(result_url) as done FROM items WHERE project_id = ?').get(id)
res.json(done)
})
Frontend polls every 5 seconds during generation and stops when done === total.
Some apps should use Alex's OpenAI/ChatGPT subscription auth for image generation/editing instead of an OpenAI platform API key. In Hermes Agent this is the openai-codex image-gen provider: it calls the ChatGPT/Codex backend with a normal chat model hosting the image_generation tool (gpt-image-2). This is capability-specific auth: do not assume OPENAI_API_KEY is available or desired.
Implementation rules:
- Distinguish provider/model IDs such as openai-codex/gpt-image-2 from FAL model IDs.
- For image edits, pass reference images as input_image message content to the Codex Responses stream; for FAL fallbacks, upload local files and send image_urls/image_url in that model's schema.
- Keep the UI model picker separate from the default: default can be OpenAI/Codex while allowing FAL or high-quality GPT Image variants.
- On OpenAI/Codex failure, fallback to a known reference-capable image edit model (for Hermes Creative currently fal-ai/nano-banana-2/edit) and persist both requested_model and actual provider/model.
- Do not spend image-generation credits for smoke tests unless the user approves; verify auth/config/build/model catalogs first.
When generating sequential frames (first frame → last frame for video), pass the first frame as a reference image for the last frame to maintain visual consistency:
const refImages = [...characterRefs, ...setRefs]
// For last frame, add first frame as highest-priority reference
if (frameType === 'last' && shot.first_frame_url) {
refImages.unshift(shot.first_frame_url)
}
const imageUrl = await generateImageFlux(prompt, refImages, { aspect_ratio: '16:9' })
FLUX.2 Pro supports up to 8 reference images. Priority order matters — put the most important reference first.
For expensive, slow API calls (video generation ~$0.07 each, ~45s), use an in-memory queue with configurable concurrency instead of sequential processing or unlimited parallelism:
const MAX_CONCURRENT = 3
let activeJobs = 0
const queue = []
function enqueueJob(id) {
if (queue.some(j => j.id === id)) return // no duplicates
queue.push({ id })
processQueue()
}
async function processQueue() {
while (queue.length > 0 && activeJobs < MAX_CONCURRENT) {
const job = queue.shift()
activeJobs++
processJob(job.id)
.catch(err => console.error(`Failed ${job.id}:`, err.message))
.finally(() => { activeJobs--; processQueue() })
}
}
Key design points:
- Skip complete items — only queue pending/failed, never overwrite successful results
- Mark status in DB — pending → generating → complete/failed (UI polls this)
- Fire-and-forget with finally — each job runs independently, slot freed on completion or failure
- No duplicates — check queue before adding
- Cost estimation in UI — show ${pendingCount} × $0.07 = $X.XX before starting
When LLMs generate large structured JSON (especially via fallback providers that may have lower effective token limits), responses can get truncated mid-string. Add a repair function:
function safeParseJSON(text) {
try {
return JSON.parse(text)
} catch (e) {
console.log('JSON parse failed, attempting repair...')
let fixed = text
const lastCloseBrace = fixed.lastIndexOf('}')
if (lastCloseBrace > 0) {
fixed = fixed.substring(0, lastCloseBrace + 1)
if (!fixed.trim().endsWith(']')) fixed = fixed + ']'
}
try { return JSON.parse(fixed) }
catch (e2) { throw new Error(`Invalid JSON from LLM: ${e.message}`) }
}
}
Also: when LLM needs to return timestamps that match audio segments, compute them from segment indices as a fallback — LLMs frequently omit or miscalculate timestamp fields even when instructed.
AI video models (Wan 2.2, etc.) output clips with inconsistent formats. Normalize before concatenating:
# Normalize to 720p, 24fps, h264, no audio
ffmpeg -y -i input.mp4 \
-vf "scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:(ow-iw)/2:(oh-ih)/2" \
-r 24 -c:v libx264 -preset fast -crf 23 -an -movflags +faststart output.mp4
# Concat with list file
ffmpeg -y -f concat -safe 0 -i concat.txt -c:v libx264 -preset fast -crf 23 video_only.mp4
# Merge video + audio (use -shortest if durations don't match)
ffmpeg -y -i video_only.mp4 -i narration.mp3 -c:v copy -c:a aac -b:a 192k -shortest final.mp4
Cache downloaded clips locally so re-exports don't re-download from expiring Replicate URLs.
AI image generation models have multi-layer safety filtering that can block legitimate creative content (mythology, dramatic scenes, historical violence). Different providers have different filter strictness AND different filter architecture:
enable_safety_checker: false only disables the OUTPUT filter. The INPUT filter is separate and cannot be disabled. Blocks certain name+context combinations.async function generateWithFallback(prompt, options) {
// Tier 1: Replicate (fast, cheap) — output filter can block
try {
return await generateImageFlux(prompt, refs, { safety_tolerance: 5, ...options })
} catch (err) {
if (!isSafetyError(err)) throw err
}
// Tier 2: fal.ai FLUX (output filter disabled) — input filter can still block
try {
return await generateImageFluxFal(prompt, refs, options)
// Uses enable_safety_checker: false, safety_tolerance: 5
} catch (err) {
if (!isSafetyError(err)) throw err
}
// Tier 3: Qwen on fal.ai (no safety filter at all) — last resort
return await generateImageQwenFal(prompt, options)
// fal-ai/qwen-image-2/text-to-image — no safety params needed
}
function isSafetyError(err) {
const msg = err?.message || ''
return msg.includes('flagged as sensitive') || msg.includes('safety') ||
msg.includes('NSFW') || msg.includes('content_policy_violation')
}
WHERE prompt IS NOT NULL but prompt was only saved in the same UPDATE as the result URL.Some products have multiple authentication domains that look similar from the UI but are not interchangeable at the API layer. Example: a tenant may use openai-codex / ChatGPT subscription device auth for text chat, while voice transcription still calls OpenAI's platform /audio/transcriptions endpoint with an API key. If transcription falls back to a control-plane API key, quota/billing errors belong to that fallback key, not the tenant's subscription.
Implementation rules:
- Trace provider credentials per capability (chat, transcription, vision, embeddings) instead of assuming one tenant auth method covers all provider APIs.
- Name fallback variables by capability, e.g. TRANSCRIPTION_OPENAI_KEY, so logs/config make the boundary obvious.
- Do not expose raw provider quota/billing text to end users. Convert insufficient_quota, billing-plan errors, 402/429 quota text, and provider docs URLs into a product-safe message such as Voice transcription is temporarily unavailable. Text chat still works.
- Keep detailed provider errors in server logs with secrets redacted, and add a regression test that asserts public JSON does not include insufficient_quota, provider docs URLs, API keys, or billing internals.
Node helper pattern:
function isOpenAiQuotaError(err) {
const text = `${err?.status || ''} ${err?.code || ''} ${err?.message || ''} ${err?.response?.data || ''}`.toLowerCase()
return text.includes('insufficient_quota') ||
text.includes('quota') ||
text.includes('billing') ||
text.includes('usage limits')
}
function publicVoiceError(err) {
if (isOpenAiQuotaError(err)) {
return 'Voice transcription is temporarily unavailable. Text chat still works.'
}
return 'Voice transcription failed. Please try again or send text instead.'
}
chat vs transcription vs vision, etc.)URLSearchParams