video-story-yolo-pipeline

/home/avalon/.hermes/skills/.archive/software-development/video-story-yolo-pipeline/SKILL.md · raw

Video Story YOLO Pipeline

10-step automated video generation pipeline: Voices → Script → Audio → Scenes → Shots → Refs → Frames → Videos → Lip Sync → Export

Key Patterns

Progress Bar Feedback

LLM Call Resilience

NSFW / Safety Filter Handling (CRITICAL LESSONS)

Safety tolerance settings: - images.js generateImageFlux(): safety_tolerance: 5 (MUST be 5, max permissive for Replicate FLUX.2 Pro, range 1-5) - advanced-images.js generateAdvanced(): default fallback 5 for models with safety_tolerance param - fal.ai models: enable_safety_checker: false disables output filter but NOT input checker

The 3-tier provider cascade (learned through trial and error): 1. Replicate FLUX.2 Pro (safety_tolerance: 5): Even at max, OUTPUT images get flagged by post-generation classifier. Mythological/dramatic characters (Zahhak, Ahriman) consistently trigger it regardless of prompt content. 2. fal.ai FLUX.2 Pro (enable_safety_checker: false): Disables output filter. But has separate INPUT prompt content checker that CANNOT be disabled. Blocks some names/themes at the prompt level. 3. Qwen should prefer EDIT mode with refs for frame fallbacks, not text-to-image. fal-ai/qwen-image-2/pro/edit preserves identity/style better than text-only fallback. Use text-to-image only as the last resort when there are truly no usable refs.

Important production finding: Qwen Pro Edit currently errors at 4 refs in production with Maximum 3 reference images allowed. Treat the practical cap as 3 refs, not 4, and trim refs by priority.

Why retries MUST use fal.ai/Qwen, NOT Replicate: Replicate's safety filter is on the OUTPUT. Rewriting the prompt doesn't help — the model generates an image and THEN it gets flagged. Retrying the same provider is pointless. Must switch providers.

Prompt rewriting with Haiku (rewriteFlaggedPrompt() in llm.js): - Uses Claude 3.5 Haiku via OpenRouter (cheap ~$0.001, fast) - Analyzes flagged prompts, rewrites preserving visual intent - Still worth doing — cleaner prompts reduce both input AND output flagging - Fallback: basic "Safe for work, family friendly, " prefix if Haiku unavailable - Direct Anthropic API key may be invalid — OpenRouter path is primary

CRITICAL BUG (fixed): reference_image_prompt must be saved BEFORE generation, not only on success. - Root cause: prompt only saved in same UPDATE as URL (on success). When gen failed, prompt stayed NULL. Retry loop queried WHERE reference_image_prompt IS NOT NULL → found nothing → skipped. - Fix: save prompt BEFORE calling generateImageFlux(). Guard with WHERE reference_image_prompt IS NULL. - Retry loop also rebuilds prompt from entity data if somehow still NULL.

YOLO retry flow (step 6 refs and step 7 frames): 1. First pass: Replicate FLUX.2 Pro with safety_tolerance: 5 (15 parallel) 2. Poll for completion with stall detection (30s) 3. Find all entities/shots with NULL image URLs (no IS NOT NULL filter!) 4. Rewrite prompts with Haiku AI (currently sequential — should be parallelized) 5. Regenerate via fallback cascade with DB freshness checks: a. Check DB — skip if URL already set by background task b. fal.ai FLUX (generateImageFluxFal()) with safety checker off c. For entity refs, Qwen text-to-image is acceptable as last resort when identity refs don't exist. d. For shot frames, prefer Qwen edit with refs, not Qwen text-only. 6. Up to 2 retry cycles 7. Validation gate: throw with entity names + "edit in Analysis/Shots tab"

Critical frame-consistency lesson: frame retry code must use the SAME ranked reference bundle as the primary generation path. Do not use a weaker retry path. - Build a shared helper that ranks refs in this order: 1. continuity ref (for last frame, the first frame) 2. visible character refs 3. set ref 4. prop refs - Then trim by provider limits (for Qwen edit, use top 3 refs only). - Never let a last-frame retry fall back to text-only generation if continuity/character refs exist.

Activity log messages during retry:

⚠️ 3 references failed (likely flagged as sensitive): Zahhak, Fereydun, Ahriman
🔄 Retry 1/2: Rewriting prompts with AI to avoid safety filters...
  ✏️ Rewrote prompt for "Zahhak"
🔄 Regenerating 3 reference images via fal.ai (safety checker disabled)...
  ⚡ fal.ai FLUX blocked "Zahhak", trying Qwen...
  ✓ Generated "Zahhak" via Qwen (fallback)

Upstream safety — reduce flags at prompt generation time: - breakdownShots() system prompt includes safety filter compliance instructions - analyzeStory() appearance field warns about safety filters - buildCharacterPrompt/buildSetPrompt/buildPropPrompt prepend "Safe for all audiences."

Validation Gates (pipeline MUST NOT proceed with missing assets)

Image Generation Settings (per-project)

Reference Image Generation

iOS PWA Considerations

Lip Sync Step (Step 9)

Critical production debugging lesson: “stuck on lipsync” may actually mean post-lipsync finalization is hung

When users report YOLO is stuck in the lip-sync phase, do NOT assume the external lipsync model is still running.

Observed production pattern: 1. PM2/out log shows: - [Lipsync] Prediction created: <id> - but never shows [Lipsync] Saved to S3: or [Lipsync] Complete: 2. DB still shows lipsync_status='generating' for that shot 3. The next dialogue shot stays pending 4. Direct check of the Replicate prediction shows status: succeeded

That means the bottleneck is likely after Replicate succeeds, inside app-side result handling.

Most likely hang point in current code: - server/lipsync.js waits for prediction success, then calls downloadAndUpload(outputUrl, key) - server/storage.js does a raw fetch(remoteUrl) + await res.arrayBuffer() with no timeout - if the fetch/download/upload path stalls, the shot remains generating forever and the sequential lipsync queue never advances

Read-only investigation workflow that worked: - inspect PM2 logs for the last [Lipsync] lines - query /api/projects/:id/lipsync-status or DB shot rows to see complete/generating/pending - if a prediction ID is present, query the Replicate prediction directly using the configured token - compare Replicate status vs DB status: - Replicate succeeded + DB still generating => app-side post-processing/finalization hang - Replicate still starting/processing => model/provider slowness

Useful cues: - lipsyncAll() is sequential by design, so one stuck shot blocks all later dialogue shots - project yolo_step='lipsync' can persist even after the provider finished if local finalization never completes - local temp artifacts in uploads/<project>/lipsync/ can prove audio extraction + video normalization already happened before the stall

Frontend Navigation Resilience

Hermes automation API

Cost Estimation

Debugging Checklist

  1. Progress bar stuck? Check yoloStep field name mismatch
  2. NSFW flags? safety_tolerance=5 is max for Replicate. If still failing → OUTPUT filtering → fal.ai/Qwen fallback
  3. Retry loop not executing? Check reference_image_prompt saved BEFORE generation
  4. Lipsync failing with "version is required"? Use version hash, not model name
  5. Process "stops" on navigation? Frontend losing state — check server status polling
  6. Double regeneration? Background parallelMap still running — add DB freshness check
  7. Stale UI? SW cache — close all tabs, reopen PWA

File Locations