10-step automated video generation pipeline: Voices → Script → Audio → Scenes → Shots → Refs → Frames → Videos → Lip Sync → Export
yoloStep (camelCase) in status endpoint/api/projects/:id/status every 3s when runningSafety tolerance settings:
- images.js generateImageFlux(): safety_tolerance: 5 (MUST be 5, max permissive for Replicate FLUX.2 Pro, range 1-5)
- advanced-images.js generateAdvanced(): default fallback 5 for models with safety_tolerance param
- fal.ai models: enable_safety_checker: false disables output filter but NOT input checker
The 3-tier provider cascade (learned through trial and error):
1. Replicate FLUX.2 Pro (safety_tolerance: 5): Even at max, OUTPUT images get flagged by post-generation classifier. Mythological/dramatic characters (Zahhak, Ahriman) consistently trigger it regardless of prompt content.
2. fal.ai FLUX.2 Pro (enable_safety_checker: false): Disables output filter. But has separate INPUT prompt content checker that CANNOT be disabled. Blocks some names/themes at the prompt level.
3. Qwen should prefer EDIT mode with refs for frame fallbacks, not text-to-image. fal-ai/qwen-image-2/pro/edit preserves identity/style better than text-only fallback. Use text-to-image only as the last resort when there are truly no usable refs.
Important production finding: Qwen Pro Edit currently errors at 4 refs in production with Maximum 3 reference images allowed. Treat the practical cap as 3 refs, not 4, and trim refs by priority.
Why retries MUST use fal.ai/Qwen, NOT Replicate: Replicate's safety filter is on the OUTPUT. Rewriting the prompt doesn't help — the model generates an image and THEN it gets flagged. Retrying the same provider is pointless. Must switch providers.
Prompt rewriting with Haiku (rewriteFlaggedPrompt() in llm.js):
- Uses Claude 3.5 Haiku via OpenRouter (cheap ~$0.001, fast)
- Analyzes flagged prompts, rewrites preserving visual intent
- Still worth doing — cleaner prompts reduce both input AND output flagging
- Fallback: basic "Safe for work, family friendly, " prefix if Haiku unavailable
- Direct Anthropic API key may be invalid — OpenRouter path is primary
CRITICAL BUG (fixed): reference_image_prompt must be saved BEFORE generation, not only on success.
- Root cause: prompt only saved in same UPDATE as URL (on success). When gen failed, prompt stayed NULL. Retry loop queried WHERE reference_image_prompt IS NOT NULL → found nothing → skipped.
- Fix: save prompt BEFORE calling generateImageFlux(). Guard with WHERE reference_image_prompt IS NULL.
- Retry loop also rebuilds prompt from entity data if somehow still NULL.
YOLO retry flow (step 6 refs and step 7 frames):
1. First pass: Replicate FLUX.2 Pro with safety_tolerance: 5 (15 parallel)
2. Poll for completion with stall detection (30s)
3. Find all entities/shots with NULL image URLs (no IS NOT NULL filter!)
4. Rewrite prompts with Haiku AI (currently sequential — should be parallelized)
5. Regenerate via fallback cascade with DB freshness checks:
a. Check DB — skip if URL already set by background task
b. fal.ai FLUX (generateImageFluxFal()) with safety checker off
c. For entity refs, Qwen text-to-image is acceptable as last resort when identity refs don't exist.
d. For shot frames, prefer Qwen edit with refs, not Qwen text-only.
6. Up to 2 retry cycles
7. Validation gate: throw with entity names + "edit in Analysis/Shots tab"
Critical frame-consistency lesson: frame retry code must use the SAME ranked reference bundle as the primary generation path. Do not use a weaker retry path. - Build a shared helper that ranks refs in this order: 1. continuity ref (for last frame, the first frame) 2. visible character refs 3. set ref 4. prop refs - Then trim by provider limits (for Qwen edit, use top 3 refs only). - Never let a last-frame retry fall back to text-only generation if continuity/character refs exist.
Activity log messages during retry:
⚠️ 3 references failed (likely flagged as sensitive): Zahhak, Fereydun, Ahriman
🔄 Retry 1/2: Rewriting prompts with AI to avoid safety filters...
✏️ Rewrote prompt for "Zahhak"
🔄 Regenerating 3 reference images via fal.ai (safety checker disabled)...
⚡ fal.ai FLUX blocked "Zahhak", trying Qwen...
✓ Generated "Zahhak" via Qwen (fallback)
Upstream safety — reduce flags at prompt generation time: - breakdownShots() system prompt includes safety filter compliance instructions - analyzeStory() appearance field warns about safety filters - buildCharacterPrompt/buildSetPrompt/buildPropPrompt prepend "Safe for all audiences."
projects:aspect_ratio (TEXT, default 16:9)reference_image_model (TEXT, default qwen)frame_image_model (TEXT, default qwen)image_mode may still exist for backward compatibility, but new code should prefer the explicit reference/frame settings.aspect_ratio up front (16:9 or 9:16).qwenfluxqwen-pronano-banananano-banana-2flux-kontextflux-editreve-fastfal-ai/qwen-image-2/text-to-image for refs without guide images and fal-ai/qwen-image-2/pro/edit for reference-driven generation.safety_tolerance: 5 and existing fallback paths.generateAdvanced().16:9.<a download>skipWaiting: true + clientsClaim: true for instant updates8311467f... ($0.014/sec)3190ef7d... ($0.05/sec)version: not model: field848x480 / 864x480, and Kling rejects height < 512. Export normalization happens too late.1280x720720x1280video_url when lipsync_url is missing.When users report YOLO is stuck in the lip-sync phase, do NOT assume the external lipsync model is still running.
Observed production pattern:
1. PM2/out log shows:
- [Lipsync] Prediction created: <id>
- but never shows [Lipsync] Saved to S3: or [Lipsync] Complete:
2. DB still shows lipsync_status='generating' for that shot
3. The next dialogue shot stays pending
4. Direct check of the Replicate prediction shows status: succeeded
That means the bottleneck is likely after Replicate succeeds, inside app-side result handling.
Most likely hang point in current code:
- server/lipsync.js waits for prediction success, then calls downloadAndUpload(outputUrl, key)
- server/storage.js does a raw fetch(remoteUrl) + await res.arrayBuffer() with no timeout
- if the fetch/download/upload path stalls, the shot remains generating forever and the sequential lipsync queue never advances
Read-only investigation workflow that worked:
- inspect PM2 logs for the last [Lipsync] lines
- query /api/projects/:id/lipsync-status or DB shot rows to see complete/generating/pending
- if a prediction ID is present, query the Replicate prediction directly using the configured token
- compare Replicate status vs DB status:
- Replicate succeeded + DB still generating => app-side post-processing/finalization hang
- Replicate still starting/processing => model/provider slowness
Useful cues:
- lipsyncAll() is sequential by design, so one stuck shot blocks all later dialogue shots
- project yolo_step='lipsync' can persist even after the provider finished if local finalization never completes
- local temp artifacts in uploads/<project>/lipsync/ can prove audio extraction + video normalization already happened before the stall
const serverGenerating = project.status === 'generating'const generating = localGenerating || serverGeneratingPOST /api/hermes/projects/create-and-runGET /api/hermes/projects/:id/statusGET /api/hermes/projects/:id/exportx-hermes-token header validated against VIDEO_STORY_HERMES_TOKEN, then VIDEO_STORY_PIN, then app PIN fallback if needed.create-and-run should accept:promptduration_targetgenrestyleaspect_ratioreference_image_modelframe_image_modelauto_yolostatus should return:server/index.js (search "YOLO MODE")server/llm.js (rewriteFlaggedPrompt())server/images.js (generateImageFlux, generateImageFluxFal, generateImageQwenFal)server/advanced-images.jsserver/lipsync.js (version hashes in LIPSYNC_MODELS)server/video.jsserver/export.jsserver/index.js (search "COST ESTIMATION")src/components/StoryPhase.jsx, AnalysisPhase.jsx, YoloStatus.jsx