ai-video-story-pipeline

/home/avalon/.hermes/skills/software-development/ai-video-story-pipeline/SKILL.md · raw

AI Video Story Pipeline

App Overview

LLM Configuration

YOLO Pipeline (10 steps)

  1. Voices — Assign voices to characters
  2. Script — Generate script from story
  3. Audio — Generate audio narration
  4. Scenes — Break story into scenes
  5. Shots — Break each scene into shots (per-scene retry, 3 attempts each)
  6. Refs — Generate reference images for characters, sets, props
  7. Frames — Generate frame images for each shot (uses FLUX.2)
  8. Videos — Generate video clips from frames (uses WAN 2.2 via Replicate)
  9. Lip Sync — Lip sync dialogue shots (Kling via Replicate, ~$0.014/sec)
  10. Export — Assemble final video (prefers lip-synced clips when available)

Critical Conventions

Story Generation

Reference Images

Guide Images (multi-upload system)

AI Image Editing / Regeneration UX Parity

Analysis Phase UI (single-column cards)

Detail Panel (layout order — no accordion/advanced section)

  1. Guide Images (top, most prominent) — multi-upload grid with always-visible delete badges, big empty-state CTA - Shows active vs overflow guides based on selected model's maxRefs - Overflow guides: faded, greyscale, "unused" overlay - Warning message with suggestion to switch models for more refs
  2. Current Reference Image — display only (if exists), with both "upload your own" and clear current reference actions - Clearing the current reference should set reference_image_url = NULL without deleting guide images - Endpoint pattern: DELETE /api/:entityType/:id/reference-image
  3. Detail Fields — name, description, appearance, personality etc.
  4. Model + Generate (bottom of form) — model dropdown + single generate button - Model selector label: always "Default (FLUX.2 Pro)" — never expose PuLID to user (implementation detail) - Generate button shows: model name, price, guide count - NO separate prompt textarea in the default detail flow — prompt built server-side from entity structured fields - generate-advanced endpoint builds prompt via buildCharacterPrompt/buildSetPrompt/buildPropPrompt when empty prompt sent - IMPORTANT: disable generate while entity save is in flight (saving) so users cannot save and immediately regenerate against stale DB state

UX Principle: No Duplicate Controls

Landing Page / Account Panel

Provider Billing Reality

PWA Patterns

YOLO Progress & Live Updates

Audio-Video Sync (FIXED)

Advanced Image Models (fal.ai)

Lip Sync (dialogue shots)

Pitfalls

Hermes automation API workflow