---
name: ai-video-story-pipeline
description: "Architecture for Alex's AI video story pipeline (video-story app) — YOLO pipeline, reference image conventions, PWA patterns, and LLM integration."
tags: [video-story, pwa, replicate, flux, openrouter, openai-codex, gpt-image-2, atlascloud, seedance-2]
---

# AI Video Story Pipeline

> **Lifecycle update (2026-07-21):** Video Story is now maintenance-only and a feature mine for existing projects/frozen APIs. New long-term production-engine behavior belongs in standalone Hermes Video and is promoted into tenant-local Caduceus Video through the versioned capability contract. For new concept→blueprint→contract work, canon, durable jobs, attempts, deterministic finishing, editing, recovery, and promotion, load `hermes-video-production`. Continue using this skill for Video Story compatibility, repair, and migration evidence.

- See `references/legacy-video-hub-to-tenant-module-audit.md` when auditing a standalone video factory absorbed into a tenant-local product: distinguish domain/human/agent/runtime parity; trace plugin paths through the real gate; challenge in-memory “durable worker” claims; inspect adapter no-ops, deployment-image dependencies, state/credential migration, exact supplied-frame workflows, accounting drift, and one real disposable-tenant production.
- See `references/chat-video-delivery-brief-vs-actual-2026-06-08.md` for the chat-delivery correction: when Alex asks for a video, a brief/contact sheet/project is not the deliverable; send an actual MP4 in-chat, and if the full AI pipeline is slow/stalled, make a deterministic 9:16 fallback MP4 from references + captions + TTS/ffmpeg while the full render continues.
- See `references/visible-speaker-reference-lock-2026-06-05.md` for the June 2026 visible-speaker/reference-lock lesson: real-person quote/parables videos need named-character identity binding, dialogue segment typing, B-roll limits, duration enforcement, and QA; prompt wording alone is insufficient.
- See `references/seedance-storyboard-no-last-frame-workflows-2026-06-02.md` for the June 2026 Seedance storyboard no-last-frame audit pattern: 3×3 Grid and Storyboard Refs must be treated as reference-workflows rather than first/last-frame workflows; audit provider defaults, UI/backend guards, and fal endpoint tests together.
- See `references/wan27-ltx23-i2v-integration-2026-06.md` for the June 2026 Wan 2.7 / LTX 2.3 I2V integration pattern: prefer newer I2V/R2V models for precise/storyboard workflows when export can trim to audio timing; keep Wan 2.2 as fallback; avoid text-to-video unless explicitly requested.
- See `references/ltx23-fal-video-beat-unprocessable-2026-06-02.md` for the LTX 2.3 fal grouped-beat `Unprocessable Entity` failure pattern: Seedance grouped beats can produce durations/resolutions outside LTX's allowed enum values; normalize via the selected video profile before retrying and persist provider payload/error bodies.
- See `references/brief8-classic-ltx-fal-10s-smoke-test-2026-06-02.md` for the exact-duration LTX/fal smoke-test pattern: full classic timeline can overrun a 10s target via narration planning, so provider/model smoke tests may need a direct LTX call from Creative assets plus ffmpeg trim/ffprobe verification.
- See `references/generated-subject-video-qa-alan-watts-2026-06-05.md` for the generated subject-video QA pattern: when a user supplies a real-person/reference image, enforce single-subject reference lock, no unrequested new characters, export-level readable text, duration verification, and post-render video-watch QA before delivery.
- See `references/production-intent-blueprint-anti-overstory-2026-06-05.md` for the top-of-pipeline production-intent fix: classify quote/documentary/music/ad/ambient/story intent first; carry subject policy, reference lock, allow-new-humans, narrative policy, and visual boundaries through downstream story/scene/shot prompts so Video Story does not force every request into a fictional hero/seeker/transformation arc.
- See `references/real-person-quote-talking-portrait-voice-lipsync-2026-06-05.md` for the real-person quote/talking-portrait contract: the referenced person must be visible and lip-synced when requested, generated script should be dialogue/quotes only unless narration is explicitly requested, and master-audio exports must avoid doubled speech from lip-synced clip audio mixed under narration.
- See `references/product-music-video-zero-character-analysis-2026-06-05.md` for the product/music-video analysis pitfall: product-focused classic YOLO prompts can fail with `Analysis produced no characters`; until product-only analysis is supported, include one minimal abstract performer/hand model/mascot or promote the product as the hero entity without letting it become a fictional protagonist.
- See `references/adaptive-production-blueprint-implementation-2026-06-05.md` for the adaptive preflight/production-blueprint implementation pattern: model profiles as production contracts, timeline modality maps, no-spend approval summaries, classifier edge-case tests, and route/port smoke verification.
- See `references/direct-audio-driven-talking-portrait-fallback-2026-06-05.md` for the direct-provider fallback lesson: for simple real-person/reference quote videos, prefer an audio-driven talking portrait model (`image_url + audio_url`) before forcing Video Story through story/scene/shot planning; use Video Story only as a comparative/richer pipeline attempt unless the user asks for that complexity.
- See `references/seedance-fal-audio-ref-public-url-2026-06-06.md` for the Seedance/fal audio-ref failure pattern: generated TTS segment paths like `/home/avalon/apps/video-story/uploads/...mp3` must be converted/uploaded to provider-accessible URLs before being stored in `video_beats.audio_urls_json` or sent as fal `audio_urls`; otherwise fal returns generic `Unprocessable Entity`.

## Current critical references
- `references/elevenlabs-audio-api-integration-2026-06-03.md`: ElevenLabs API integration guidance for VideoStory audio: TTS provider selection, voice/character mapping, Text-to-Dialogue, SFX/music beds, STT alignment/captions, audio isolation, voice changer, dubbing, and why direct API integration should be primary over Eleven Creative Studio.
- `references/image-model-smoke-test-2026-06-03.md`: VideoStory image model smoke-test pattern for checking every core/advanced model, verifying generated S3 URLs, and preserving provider quirks such as Replicate FLUX resolution enums and Nano Banana 2 `thinking_level` values.
- `references/hermes-creative-brief-to-videostory-ltx-fal.md`: Hermes Creative brief handoff into Video Story with classic timeline, LTX 2.3 on fal.ai, provenance-preserving recovery from missing references, export stability checks, and duration-target caveats.
- `references/seedance-provider-audio-duration.md`: Seedance provider-audio strategies, export audio preservation, and short-duration target guardrails.
- `references/seedance-3x3-grid-pipeline-semantics-2026-06-02.md`: 3×3 Grid / `seedance_storyboard_grid` contract: one contact-sheet board, one timing/scene/shot unit, single `@Image1` Seedance reference prompt, no default narration/dialogue.
- `references/seedance-3x3-yolo-smoke-tests-2026-06-02.md`: short 10s two-variant 3×3 smoke-test pattern; verify one timing unit/scene/shot/beat, use explicit audio strategies, and remember Hermes/API-triggered YOLO needs generate-story + analyze before `/yolo`.
- `references/seedance-3x3-provider-routing-not-found-2026-06-02.md`: after 3×3 semantics pass, distinguish provider `Not Found` routing/model failures from story/timeline/scene regressions; includes DB contract checks and triage steps.
- `references/seedance-3x3-video-phase-no-last-frame-ui-2026-06-02.md`: 3×3 Video tab/backend gating pattern: label the grid as the single reference, hide Last Frame UI, and allow single-shot video queueing without `last_frame_url`.
- `references/video-model-profile-ui-selector-2026-06-02.md`: Homepage/API pattern for showing per-mode default model badges and compact model dropdowns wired to `video_model_profile` project creation/update.

## App Overview
- **URL**: video-story.apps.poofc.com
- **Path**: /home/avalon/apps/video-story
- **Stack**: React + Vite PWA (frontend), Express + SQLite (backend)
- **PM2**: video-story process

## LLM Configuration
- **Primary**: local Hermes API server bridge (`HERMES_API_SERVER_URL` + `HERMES_API_SERVER_KEY`), which routes text/script work through Alex's main Hermes provider — currently OpenAI Codex OAuth/subscription rather than billable OpenAI API keys.
- **Fallback chain**: OpenRouter, then OpenAI chat-completions if `OPENAI_API_KEY` is configured, then direct Anthropic API as final fallback.
- Never use Anthropic as primary — causes 429 rate limits during batch operations and the direct key can drift/expire.
- When debugging `invalid x-api-key` from story/script generation, inspect the whole provider chain. A visible Anthropic 401 can be a secondary fallback failure after OpenRouter returned 402 insufficient credits or after the Hermes bridge is not configured. Test Hermes `/v1/chat/completions`, OpenRouter credits, Anthropic auth, and OpenAI fallback independently, and keep LLM failures separate from fal/image billing failures.
- See `references/llm-provider-fallback-and-auth-2026-05-31.md` for the provider-chain debugging and verification recipe.

## Hermes Creative Bridge

Hermes Creative video briefs should map into video-story's native reference model rather than arrive as generic prompts. Treat Creative as the upstream brand/vault intelligence layer and video-story as the execution workspace for video drafts.

Implemented baseline (commit `068cf7b`, 2026-05-30): video-story supports `project_mode='social_creative'`, Creative provenance tables (`creative_imports`, `creative_import_assets`), token-protected `POST /api/hermes/creative-brief-drafts`, idempotent draft creation by Creative project slug + brief id, entity/guide-image seeding, and UI banner/label changes for imported social creative drafts. Draft import never starts YOLO automatically.

Live-audit pitfalls (2026-05-31): Creative media assets may arrive as relative `/media/...` URLs that only resolve on `hermes-creative.apps.poofc.com`, not on `video-story.apps.poofc.com`; normalize/prefix these to absolute Creative public URLs before storing or sending to providers. Also infer vertical aspect ratios from slash-style social channels like `reel/instagram`, not only exact enum values such as `instagram_reels`.

Expected mapping:
- Creative `character_reference` assets → `characters` plus `guide_images`.
- Creative `set_reference` assets → `sets` plus `guide_images`; preserve the existing set rule: empty environment, no people/characters.
- Creative `prop_reference` / `product_reference` assets → `props` plus `guide_images`; preserve object-only/no people unless deliberately specified.
- Creative `style_reference` assets and prompt fragments → `project.reference_style_prompt` and `project.frame_style_prompt`.
- Creative `logo_reference` assets → overlay/end-card/logo reference fields in the derived video project.
- Creative brief strategy/copy → script seed, beat outline, hook, CTA, duration, aspect ratio.

Bridge safety: creating a video-story project from a Creative brief should create a draft only. Do not launch YOLO/rendering automatically unless the Creative review state and Alex's instruction explicitly approve execution. See Hermes Creative reference `phase-2-media-reference-and-video-bridge-2026-05-30.md`.

Workflow recommendations (2026-06-01): Creative handoffs should include a `video_story_workflow` recommendation instead of relying on Video Story to hard-code one mode. Keep `seedance_cinematic` as the default production recommendation, but preserve explicit workflow requests such as `seedance_storyboard_refs`, `seedance_storyboard_grid`, `seedance_prompt_batch`, and `seedance_dialogue`. Video Story should normalize the recommendation through its own workflow/profile registry and persist `generation_mode`, `video_model_profile`, `workflow_mode`, `visual_planning_mode`, `export_strategy`, `reference_strategy`, and audio strategy from that resolved workflow. See `references/hermes-creative-workflow-recommendations-2026-06-01.md`. 

## Social Creative Voiceover Contract
- For `project_mode='social_creative'`, production beat sheets are not narration. `Visual:`, `Overlay:`, `CTA/Note:`, beat labels, timing labels, and production notes must never be sent to TTS.
- Use `segmentSocialCreativeForAudio()` / `extractSocialCreativeBeats()` in `server/llm.js` to parse beat sheets and create audio segments from `Voiceover:` fields only. This fixed the Brief 7 failure where the narrator spoke the entire beat metadata and stretched a 30s reel into ~209s.
- Apply the same social parser in every audio entrypoint, including `/api/projects/:id/generate-audio` and the internal `/yolo` audio step. Do not let YOLO call generic `segmentStoryForAudio()` for social creative projects; that regression caused Brief 8 to narrate beat metadata until patched.
- Tests live in `tests/server/social-creative-script.test.js`; keep them passing when changing social/ad script generation. For live Creative reruns, inspect `script_segments.text` before video generation and confirm it contains only spoken voiceover copy.
- Longer-term target: persist a structured social treatment (`beats[].visual`, `overlay_text`, `voiceover`, `production_note`, timing) and feed scenes/shots from that structure instead of re-parsing plain text.

## Social Creative Imported Character Preservation Pitfall
- Current live behavior can wipe or underuse Creative-imported `character_reference` entities during the Analysis / Define Scenes / Breakdown Shots path. In the 2026-05-31 North Star Venus–Jupiter test, Creative imported Venus/Jupiter transparent god assets as character references, but `/api/projects/:id/analyze` deleted the initially seeded characters and recreated scenes with glyph/prop language; `/define-scenes` then wiped `scene_characters`, and `/breakdown-shots` emitted `characters_visible: []` for the transit shots.
- Until the app is fixed, verify after Analyze/Define/Breakdown:
  - `characters` contains the imported characters with `reference_image_url` / `guide_image_url`.
  - `scene_characters` links those characters to the relevant scenes.
  - `shots.characters_visible` is not `[]` for shots meant to show the imported characters.
  - first/last frame prompts explicitly mention the imported character refs, not only abstract glyphs.
- If needed for a test run, reseed characters from `creative_import_assets` where `slot_role='character_reference'`, insert matching `guide_images`, relink scenes, and patch shot prompts before frame generation. Durable product fix: make social analysis preserve imported character references through scene and shot generation automatically.

## YOLO Pipeline (10 steps)
1. **Voices** — Assign voices to characters
2. **Script** — Generate script from story
3. **Audio** — Generate audio narration
4. **Scenes** — Break story into scenes
5. **Shots** — Break each scene into shots (per-scene retry, 3 attempts each)
6. **Refs** — Generate reference images for characters, sets, props
7. **Frames** — Generate frame images for each shot (defaults to GPT Image 2 through Hermes OpenAI/Codex OAuth; legacy/selectable paths include Qwen/Fal, FLUX/Replicate, and advanced fal.ai models)
8. **Videos** — Generate video clips from frames (current default uses WAN 2.2 via Replicate; experimental Seedance 2 via Atlas Cloud is a strong candidate because it accepts explicit first `image` + `last_image`)
9. **Lip Sync** — Lip sync dialogue shots (Kling via Replicate, ~$0.014/sec)
10. **Export** — Assemble final video (prefers lip-synced clips when available)

## Critical Conventions

### Story Generation
- Story prompts MUST exclude character appearance details (hair, clothing, build, age, skin)
- Appearance is handled separately in the Analysis step's `appearance` field
- This prevents conflicts when user edits appearance in detail panel

### Reference Images
- **Set references**: Must exclude all people/characters — empty environment only
- **Prop references**: Must exclude all people/characters — object only
- **Character references**: Character portrait with appearance details; PuLID used when guide images exist
- **Single-subject real-person shorts**: if the user supplies a reference image for a real/personality subject, treat that subject as identity-locked across the whole render. Do not let the planner introduce concrete new humans such as "the seeker", younger doubles, gurus/students, crowds, shirtless transformation figures, etc. unless explicitly requested. Symbolic metaphors should usually be insert shots or abstract/environmental beats, and the final shot should return to the locked subject.
- **Real-person quote / talking portrait mode**: if the user asks to show the referenced person talking or speaking quotes, create that person as the visible continuity-critical character and require lip-sync-capable shots. Do not create a generic voice-only `Narrator` in their place. Script segments should be only the requested speaker's dialogue/quotes unless narration glue is explicitly requested. In master-audio exports, strip/mute generated clip audio unless it is explicitly ambience/SFX-only; lip-synced clip speech mixed under the master track creates doubled/overlapping voices. See `references/real-person-quote-talking-portrait-voice-lipsync-2026-06-05.md`.
- Frame generation uses ALL reference types: character + set + prop images (via scene_props relationship)
- Default reference/frame generation uses GPT Image 2 through Hermes OpenAI/Codex OAuth. Keep Qwen/Fal, FLUX/Replicate, and advanced fal.ai models as selectable/legacy alternatives rather than deleting them.
- GPT Image 2 app UI cap is currently 10 reference inputs; provider-specific caps still vary and must be enforced per selected model.
- **Continuity reference priority**: reference images are for reusable visual identity, not every extracted noun. Analysis should classify entities with `reference_priority` (`required|recommended|optional|prompt_only`), `reuse_scope`, and `reference_reason`; bulk generation should default to `required`/`recommended` only. One-off props, chat text, notifications, overlays, transient VFX, abstract motifs, atmosphere/mood, and set dressing should usually stay prompt-only while still appearing in scene/shot prompts. First/last-frame workflows reduce the need for one-shot prop refs because the first frame can carry local continuity into the last frame. See `references/continuity-reference-prioritization-2026-06-02.md`.

### Guide Images (multi-upload system)
- `guide_images` table stores multiple guide photos per entity (entity_type, entity_id, image_url, sort_order)
- Legacy `guide_image_url` column on characters/sets/props stays in sync (first guide image)
- API: POST/GET `/:entityType/:id/guide-images`, DELETE `/guide-images/:guideId`
- Entity list endpoints return `guide_images` array attached to each entity via `attachGuideImages()` helper
- All generate endpoints (standard, advanced, YOLO bulk) query `getGuideImageUrls()` from guide_images table
- Characters with guides use PuLID (identity-preserving, 1 ref); sets/props pass all guides as FLUX refs
- Model maxRefs vary by provider reality, not preference: PuLID=1, Reve/Qwen/Kontext=1, Nano Banana=4, FLUX.2 Edit=9, Nano Banana 2/Pro=14. Qwen Pro Edit currently must be capped at 3 refs because Fal returned 422 “Maximum 3 reference images allowed”; do not raise it without live docs/testing.
- Guide images beyond model capacity are visually faded and not sent to API
- CRITICAL regeneration rule: advanced regeneration must use guide images + explicit extra refs only; do NOT silently fall back to the existing AI-generated `reference_image_url` unless the user explicitly requests an "edit existing reference" behavior

### AI Image Editing / Regeneration UX Parity
- Treat AI image generation/editing/regeneration as long-running jobs with persistent user-visible state, not as button-local spinners. If a user refreshes or reopens the PWA, active and recent jobs should rehydrate from SQLite/server status.
- Users must be able to run multiple image jobs in parallel. Do not globally disable all generate/edit buttons while one job runs; show a compact activity/log panel with all active jobs, statuses, errors, and completions.
- Every uploaded, generated, and edited image should automatically enter the image-analysis/enrichment path. If analysis cannot run, show a clear “Needs setup”/provider issue rather than silently omitting analysis or pretending a generic caption is enough.
- Editing/regenerating from an existing image must actually send the source/guide/reference image(s) to the provider. Before changing prompts, inspect the provider payload and model reference caps (`image_url` vs `image_urls`, PuLID single ref, Qwen caps, etc.).
- UI feedback should happen where the user is looking: on the placeholder/preview or active card, with logs collapsible nearby. Button placement should support the flow: prompt/model controls below the preview, not detached from the image being edited.

### Analysis Phase UI (single-column cards)
- Single-column layout (not grid) — gives each entity a full-width hero image
- Each card: hero reference image, entity type pill, status badge (Reference/Guides added/No images)
- Guide image thumbnail strip with inline + button directly on cards (no drawer needed to upload)
- Footer action: "Set Up & Generate" or "Edit & Regenerate" opens detail panel
- `fileInputRefs` use a ref object keyed by `${type}_${id}` for per-card file inputs

### Detail Panel (layout order — no accordion/advanced section)
1. **Guide Images** (top, most prominent) — multi-upload grid with always-visible delete badges, big empty-state CTA
   - Shows active vs overflow guides based on selected model's maxRefs
   - Overflow guides: faded, greyscale, "unused" overlay
   - Warning message with suggestion to switch models for more refs
2. **Current Reference Image** — display only (if exists), with both "upload your own" and **clear current reference** actions
   - Clearing the current reference should set `reference_image_url = NULL` without deleting guide images
   - Endpoint pattern: `DELETE /api/:entityType/:id/reference-image`
3. **Detail Fields** — name, description, appearance, personality etc.
4. **Model + Generate** (bottom of form) — model dropdown + single generate button
   - Model selector label should present the user-facing default as "Default (GPT Image 2)"; PuLID can still be used internally for characters with guide images when that path is selected by backend rules, but do not make PuLID the main user-facing default.
   - Generate button shows: model name, price, guide count
   - NO separate prompt textarea in the default detail flow — prompt built server-side from entity structured fields
   - `generate-advanced` endpoint builds prompt via `buildCharacterPrompt/buildSetPrompt/buildPropPrompt` when empty prompt sent
   - IMPORTANT: disable generate while entity save is in flight (`saving`) so users cannot save and immediately regenerate against stale DB state

### UX Principle: No Duplicate Controls
- NEVER put duplicate prompts, guide images, or settings inside an "advanced" accordion
- "Advanced" should only ADD controls (e.g. model selector) on top of existing UI
- Alex explicitly rejected the pattern of an accordion that duplicated the prompt and guide images
- The model selector is just a dropdown near the generate button, not a separate section

## Landing Page / Account Panel
- Home page now has a top-right `⚙️ Account` button above the `Video Story` title
- The account view is a right-side drawer, following the same `fixed inset-0 z-50 flex justify-end` pattern as ActivityPanel / DetailPanel
- Account data comes from `GET /api/account/overview`
- `server/account-status.js` is the source of truth for:
  - provider credit/account summaries
  - grouped model inventory by platform
- This endpoint should report reality, not aspirational balances. If a provider does not expose credit balance with the current key/API, surface that clearly in `note`

## Civitai Orchestration

Civitai offers two API surfaces:
- **Site API** (`https://civitai.com/api/v1`): Browse/search models, images, creators, tags, AIR identifiers. Public endpoints; authenticated for `/me`.
- **Orchestration API** (`https://orchestration.civitai.com`): Submit generation jobs, polls, blobs. Requires `Authorization: Bearer <token>`.

### What Civitai Orchestration supports
- Image generation: OpenAI (GPT Image, DALL-E), Qwen (sdcpp + fal), Flux, SDXL, SD1.5, Gemini, Grok, Seedream, Z-Image, Anima, ERNIE.
- Video generation: WAN 2.1–2.7, Kling V3, Vidu/Vidu Q3, HunyuanVideo, LTX2, Veo 3.
- Chat completion via OpenRouter-compatible endpoint.
- LoRA training on selected ecosystems.

### Critical pitfalls
- **`allowMatureContent: true` does NOT bypass downstream provider safety filters.** It only controls whether Civitai masks mature blob URLs in responses. The actual generation still passes through the selected engine's moderation.
- **fal/Qwen2 sanitizes NSFW prompts** via `enablePromptExpansion: true` (default). Even with explicit pornographic prompts, output was `nsfwLevel: "pg"`. For potentially mature prompts through Qwen, consider `enablePromptExpansion: false` or use a different engine.
- **sdcpp (Civitai-hosted SD/Qwen) currently fails on all requests** (not just NSFW). Do not rely on it until the engine stabilizes.
- **Cost is in Buzz.** `1000 Buzz ≈ $1` in some routes. Use `whatif=true` for exact cost previews.
- **Output blob URLs are signed and temporary.** Download immediately; do not cache URLs long-term. Re-poll `GetWorkflow` for fresh signed URLs if the first expires.
- **Always submit with `wait=0`** and poll for status. Inline waiting times out for most jobs.
- **`hideMatureContent`** query param on workflow endpoints controls URL visibility, not generation behavior.
- **Account Buzz balance** is available via `GET /api/v1/me` (Site API), not the orchestrator.

### Civitai vs Venice for NSFW
- Civitai Orchestration routes through external providers (OpenAI, Google, FAL, etc.) which apply their own safety policies. There is no Civitai-native uncensored generation pipeline.
- **Venice (Lustify) remains the primary uncensored path** for NSFW image/video in the Video Story stack. Civitai is a complementary browse/search/model-discovery layer, not a replacement for Venice.
- If Civitai sdcpp becomes stable, it could be a strong uncensored option because Civitai-hosted workers have their own moderation rather than big-tech filters. Monitor sdcpp status before relying on it.

## Provider Billing Reality
- **fal.ai** billing balance is available via `GET https://api.fal.ai/v1/account/billing?expand=credits`
- BUT fal.ai only allows this for **ADMIN keys**. Standard generation keys return 403 with an insufficient-permissions message
- Therefore, if the app uses a non-admin `FAL_KEY`, the UI must show balance unavailable and explain that billing reads require an ADMIN key
- **Replicate** `GET https://api.replicate.com/v1/account` returns account identity (e.g. username) but does **not** expose remaining credit balance in the currently used API flow
- Therefore, the app should show Replicate account identity if available, but mark credit balance unavailable instead of faking a number
- `server/account-status.js` imports `dotenv/config` directly so standalone scripts/tests loading it still see `.env`

## Cost Estimation Reality
- Video Story cost estimates are model-specific. Do not use the legacy `$0.07/video` WAN estimate for Seedance projects.
- Seedance should be estimated by billable seconds with the 4–15s duration clamp/minimum per generated clip/beat. Prefer `video_beats.requested_duration_s` for grouped Seedance workflows; fall back to pending framed shots and `video_requested_duration_s`/`duration_ms` when no beats exist.
- Atlas Seedance and fal.ai Seedance have different per-second rates; default Seedance should use Atlas pricing unless the project explicitly selected a fal profile.
- Keep both cost surfaces in sync: backend `/api/projects/:id/cost-estimate` and frontend `VideoPhase` inline “remaining” estimate.
- See `references/seedance-cost-estimation-2026-06-01.md` for the session-derived implementation pattern and no-disruption verification nuance.
- See `references/seedance-dialogue-audio-strategies-2026-06-02.md` for the Seedance dialogue audio strategy pattern: prompt-native audio vs audio refs, provider routing, beat-level audio refs, and export rules for preserving provider audio.
- See `references/seedance-dialogue-smoke-test-ops-2026-06-02.md` for the short ≤6s Seedance Dialogue smoke-test recipe, including the current fal reference profile routing, manual one-shot recovery for stalled shot breakdown, first-frame-as-last-frame unblock, and ffprobe/provenance verification.

## PWA Patterns
- Service Worker: skipWaiting + clientsClaim for immediate updates on deploy
- registerType: 'autoUpdate' in Vite PWA config
- iOS downloads: Use navigator.share() with blob — `<a download>` fails on iOS PWA
- Desktop downloads: Blob URL + programmatic click
- Server has /api/projects/:id/download-video endpoint with Content-Disposition: attachment
- Close button safe area: Use env(safe-area-inset-top) for phone status bar clearance
- Min tap targets: 44x44px for mobile

## YOLO Progress & Live Updates
- Status endpoint returns `yoloStep` (camelCase), recent logs array
- Frontend polls every 3s during active runs
- Animated spinner + live activity feed with timestamps
- Verbose server-side logging for each sub-step (per-scene, per-frame progress)
- **refreshKey pattern**: ProjectPage increments `refreshKey` every 5s during YOLO. ALL phase components
  (Analysis, Timeline, Images, Video, Export) must accept `refreshKey` prop and include it in their
  `useEffect` fetch dependency array: `useEffect(() => { fetchData() }, [project.id, refreshKey])`
- Bug history: originally only AnalysisPhase had refreshKey — Timeline/Images/Video/Export were stale
  during YOLO until user switched tabs. Fixed by passing refreshKey to every phase.

## Audio-Video Sync (FIXED)
- Audio is the master timeline — shots derive durations from script_segments, not LLM
- Short duration targets are valid project requirements, not falsy/zero-minute edge cases. Never display `Math.floor(duration_target / 60)` for sub-minute targets; use seconds-aware labels (`6s`, `45s`, `1m 15s`). For prompt-native/silent/visual-first modes that do not generate local TTS, scale placeholder script/timing segments to `duration_target` so story/timeline planning cannot expand a 6s or 10s request into a 40–50s composition.
- `breakdownShots()` LLM decides cinematography; code computes `duration_ms` from segment timestamps
- Shot durations use "full slot" timing: from segment's start_time_ms to next segment's start_time_ms (includes 150ms silence gaps)
- Durations rounded to 0.5s for Replicate (WAN 2.2 accepts floats 0.5–10)
- Export trims each clip to exact target duration with ffmpeg `-t` flag
- Final merge uses audio duration as target: `-t ${audDuration}` (not -shortest)
- `actual_video_duration_ms` stored after generation for drift monitoring
- Classic timeline must speak only the actual script. Do not let mode-aware/Seedance-style `Beat N`, `Scene N`, panel/take headings, markdown headings, timing labels, or production notes enter TTS segmentation; they become bogus audio slots and desync scenes/shots/lip-sync. Strip them before `segmentStoryForAudio()` and keep classic story prompts free of structural labels. See `references/classic-timeline-nonspoken-headings-audio-sync-2026-06-02.md`.
- Venice AI rejected as provider: only supports integer-second enums, pipeline needs fractional durations

## Advanced Image Models (fal.ai)
- All models use `image_urls` (array) for reference images — including Qwen standard Edit
- Bug history: Qwen Edit was sending `image_url` (single string) which caused 422. ALL Qwen variants need array.
- Model param mapping in `advanced-images.js`: kontext/reve use `image_url` (single), everything else uses `image_urls` (array)
- When no prompt provided to `generate-advanced`, server auto-builds from entity fields using same functions as default generate
- IMPORTANT nuance: if the default detail-panel flow sends an empty prompt for advanced generation, project style is still injected indirectly because the server rebuilds via `buildCharacterPrompt/buildSetPrompt/buildPropPrompt`
- If you want truly manual/no-style-injection advanced prompting, the client must send an explicit full prompt instead of `prompt: ''`

## Video Model Options
- Current default precise/classic video path is **Wan 2.7 Image-to-Video via Atlas** (`wan-2.7-atlas-i2v`). Wan 2.2 via Replicate remains a legacy/fallback profile because it fits the older first/last-frame shot architecture and accepts sub-second-ish durations after app-side rounding/trimming.
- Use a **video model profile** abstraction for new generators rather than swapping provider strings inline. Persist both project-level and shot-level `generation_mode`, `video_model_profile`, requested duration, audio strategy, and actual sent prompt/provider/model at generation time.
- Homepage/workflow UI must treat model choice as mode-scoped, not global: show each mode's default profile, expose only compatible profiles in a compact dropdown, and send the selected `video_model_profile` into project creation/update. See `references/video-model-profile-ui-selector-2026-06-02.md`.
- For Seedance 2, prefer **Atlas Cloud `bytedance/seedance-2.0/image-to-video`** as the first Video Story integration over fal.ai `reference-to-video`: Atlas exposes explicit `image` (first frame) and `last_image` fields that map directly to the shot frame contract.
- Seedance 2 duration is integer-only `4..15` seconds (`-1` auto on Atlas). Seedance mode must be timing-aware upstream: generated clips must fit 4–15s, but **beat boundaries should be story/content dependent**, not mechanically derived by dividing target duration. Treat target duration as a creative budget / pacing constraint after understanding the story, storyboard, ad structure, hook/proof/CTA, and visual aims.
- Production Seedance grouping generates one `video_beats` clip per 4–15s beat, using the first covered shot's first frame and the last covered shot's last frame when the workflow is `first_last`. Export assembles `video_beats` directly for grouped Seedance projects instead of repeating/chopping the same beat per original shot.
- User-facing naming: prefer **Seedance Duration Beats** for the former `Seedance Cinematic` / internal `seedance_cinematic` mode, and **Precise Narrated Story** for `classic_timeline`. Keep internal keys stable for compatibility.
- Treat Seedance reference-to-video as a separate workflow, not as “first/last-frame but with more refs.” Use explicit workflow strategy metadata (`workflow_mode`, `visual_planning_mode`, `export_strategy`, `reference_strategy`, and `frameReferenceMode`) and storyboard persistence (`storyboards`, `storyboard_frames`, `storyboard_runs`) so reference-to-video can produce one or more storyboard reference takes, 3×3 grid takes, or prompt-batch takes without distorting the classic shot timeline. The landing-page mode selector is selecting these workflow strategies, not merely providers; do not key copy/behavior solely off `generationMode.startsWith('seedance')` or experimental modes like `seedance_dialogue` will inherit misleading cinematic 4–15s warnings.
- Not every workflow requires a last frame. `Storyboard Refs` and `3×3 Grid` should be modeled as reference compositions/panels that can change inside a Seedance clip via references and prompt direction. UI/progress/generate-all-frame logic should consult `frameReferenceMode` and hide/skip mandatory last-frame generation for no-last-frame workflows.
- `3×3 Grid` / `seedance_storyboard_grid` is one contact-sheet board workflow, not nine scenes/shots/clips. Generate exactly one grid motion plan (`Global motion direction` + Panel 1–9), one visual timing unit, one scene, one shot/take, one grid reference image, and one Seedance `@Image1` prompt that tells the model to read panels left-to-right/top-to-bottom with timestamped internal panel beats. Do not use generic first-frame/last-frame prompt language or require `last_frame_url` for this mode. Entity references are separate: character/set/prop refs must stay normal portraits/objects/locations and must not inherit 3×3/contact-sheet style language; only the shot/grid reference is a 3×3 contact sheet. The Video tab must also be mode-aware: label the image as `3×3 Grid Reference`, hide any `Last Frame / No frame` panel, and let single-shot/backend queue gating proceed without `last_frame_url`. If manual or YOLO testing surfaces bad 3×3 semantics, stop the run before video spend and regenerate from corrected upstream phases. See `references/seedance-3x3-grid-pipeline-semantics-2026-06-02.md` and `references/seedance-3x3-video-phase-no-last-frame-ui-2026-06-02.md`.
- Avoid the old anti-pattern of generating 4s Seedance clips for 1–2s shot slots and trimming them down; it breaks motion flow and wastes generation. If grouped beat export is unavailable or disabled, warn clearly before using per-shot Seedance.
- Do not silently clamp Seedance shots longer than 15s if that would under-cover the audio timeline. Split/group into valid beats where possible; otherwise fallback long shots to the classic/WAN path or block with a clear warning.
- Disable provider-native audio by default for Seedance (`generate_audio: false`) because Video Story audio/narration remains the master timeline. Do not present Seedance Dialogue as production-ready unless audio refs/lip-sync/replacement policy are implemented.
- Seedance Dialogue currently means dialogue-aware timing/visual performance planning, not audible dialogue. If `video_audio_strategy='provider_native_audio'`, local TTS is skipped; Atlas still sends `generate_audio:false`; `video_beats.audio_urls_json` remains empty; and export normalizes clips with `-an` before only mixing `project.audio_file_path` if present. Result: silent clips/exports. Prefer fixing this by switching Dialogue to a TTS-backed master-audio strategy and optional lip sync rather than relying on provider-native Seedance audio. See `references/seedance-dialogue-audio-gap-2026-06-02.md`.
- Record provider-start trace/provenance before awaiting long video provider calls, so failures still expose the exact prompt/profile/payload context in pipeline inspection. Also persist provider failure payload/error body/status (not only a generic exception message like `Unprocessable Entity`) so future log reviews can identify schema/enum mismatches without re-running.
- Atlas Cloud credentials are stored without exposing values: Video Story uses `/home/avalon/apps/video-story/.env` `ATLASCLOUD_API_KEY`; Hermes root uses `/home/avalon/.hermes/.env` `ATLASCLOUD_VIDEO_API_KEY`.
- Atlas Cloud Seedance completion payloads may put MP4 URLs in `data.outputs[]` (not only `output`, `url`, or `video_url`). If Video Story reports `timed out ... Last status: completed`, poll the Atlas prediction ID, extract `data.outputs[0]`, upload to S3, and mark the shot `video_status='complete'` (not `completed`) so export can assemble it.
- For FAL Seedance reference-to-video, verify the `@fal-ai/client` import shape before debugging prompts or credentials. Use the documented `import { fal } from '@fal-ai/client'` shape; an `import * as fal` namespace may not expose `fal.config`, causing `fal.config is not a function` before provider generation starts. Also treat `Videos: 0/N done` as failure, not a completed YOLO run. See `references/yolo-seedance-fal-and-wan-failure-audit-2026-06-01.md`.
- For LTX 2.3 fal I2V in grouped-beat workflows, validate the generated beat payload against the LTX profile rather than Seedance defaults: duration must be one of the profile enum values (currently 6/8/10/12/14/16/18/20) and resolution must be LTX-supported (currently 1080p/1440p/2160p). If a YOLO run fails 0/N video beats with generic `Unprocessable Entity`, inspect `video_beats` durations/profile first, normalize the actual LTX payload, and rerun only failed videos. Do not immediately switch providers when the evidence points to an LTX schema/API-call mismatch; fix the selected provider call first. See `references/ltx23-fal-video-beat-unprocessable-2026-06-02.md`.
- For Seedance rerun hardening, Atlas Cloud should be primary and fal.ai fallback; backfill existing Seedance project/beat profile fields before retry, and label workflow/provider in project list + project view so mode is visible without DB/log inspection. See `references/seedance-atlas-primary-rerun-2026-06-01.md`.
- See `references/seedance2-atlascloud-vs-fal-2026-05-31.md` for the schema, payload, pricing notes, and implementation pitfalls.
- See `references/seedance-video-mode-implementation-2026-05-31.md` for implementation shape, verification checklist, and UX copy cautions for Seedance mode.
- See `references/seedance-video-mode-landing-2026-05-31.md` for the landed per-shot Atlas Seedance implementation notes, provenance fields, honest `per_shot` defaults, and final verification recipe.
- See `references/brief8-seedance-test-atlas-output-and-social-yolo-audio-2026-05-31.md` for the live Brief 8 rerun lessons: YOLO must use social voiceover parsing, Atlas MP4 URLs can live under `data.outputs[]`, and export requires `video_status='complete'`.
- See `references/seedance-grouped-beat-timing-2026-06-01.md` for the grouped Seedance beat timing implementation and verification recipe: 4s+ upstream planning, `video_beats` generation, grouped export, and the final-sub-4s tail merge regression.
- See `references/seedance-workflow-modes-storyboard-2026-06-01.md` for the workflow-mode architecture and experimental Seedance storyboard/reference modes: `workflow_mode` strategy metadata, storyboard persistence tables, `@ImageN` reference prompting, 3×3 contact-sheet prompting, UI affordances, and the default-audio-strategy pitfall.
- See `references/mode-aware-story-scene-shot-audio-planning-2026-06-01.md` for the durable implementation pattern: mode must drive Generate Story, Analyze, Define Scenes, Breakdown Shots, audio policy, provider audio settings, and export assumptions — not just final video provider selection.
- See `references/seedance-duration-beats-and-reference-mode-2026-06-01.md` for Alex's correction on Seedance Duration Beats: beat rhythm is story/content dependent; target duration is a budget, not a mechanical divider; and storyboard/grid/reference workflows must not require last frames.

## Lip Sync (dialogue shots)
- **Module**: `server/lipsync.js`
- **Model**: Kling Lip Sync via Replicate (`kwaivgi/kling-lip-sync`) at $0.014/sec
- **Fallback**: Sync Lipsync 2 (`sync/lipsync-2`) at $0.05/sec
- **How it works**: Post-processing step — takes existing video clip + extracted audio segment, produces lip-synced version
- **Pipeline integration**: Step 9 in YOLO (between Videos and Export). Non-blocking — errors don't stop export.
- **Audio extraction**: `ffmpeg -ss {start} -t {duration}` from full narration to get per-shot dialogue audio
- **DB**: `lipsync_url` and `lipsync_status` columns on shots table
- **Export**: `COALESCE(lipsync_url, video_url)` — prefers lip-synced version when available
- **Only dialogue shots**: Identified by `segment_type = 'dialogue'` on the shot's linked script_segment
- **API endpoints**:
  - `GET /api/projects/:id/lipsync-status` — status + cost estimate
  - `POST /api/projects/:id/lipsync-all` — batch process (runs in background, logs progress)
  - `POST /api/shots/:id/lipsync` — single shot (synchronous)
- **Kling constraints**: video 2-10 sec, 720p-1080p — matches our shot lengths perfectly
- **Cost**: ~$0.31 for 7 dialogue shots (22 sec total). Very cheap addition to pipeline.

## Pitfalls
- Video generation queue is in-memory but shot status is SQLite-backed. After process restart, stale `video_status='generating'` rows can survive with no active job. `getQueueStatus(projectId)` should reset those rows to `pending` when `activeJobs===0` and `queue.length===0`, and `/video-status` should expose active/waiting/halted/maxConcurrent so the UI can explain what is actually running. See `references/video-queue-refresh-and-qwen-pro-ref-cap-2026-05-18.md`.
- WAN/Replicate `CUDA out of memory` during YOLO video generation is usually provider capacity/concurrency, not prompt failure. If most clips complete but a few fail, retry those clips and consider lowering max active video jobs; a live audit saw 20 active WAN jobs, 20/22 completed, and 2 failed from GPU OOM. See `references/yolo-seedance-fal-and-wan-failure-audit-2026-06-01.md`.
- PWA cache: After deploy, users may need to close+reopen PWA for SW update
- Safari vs PWA: Separate caches — Safari browser cache ≠ PWA SW cache
- Shot breakdown: Must retry per-scene (not fail entire step on one scene failure)
- YOLO must validate ALL scenes have shots before proceeding to refs
- OpenRouter 429s: Already handled with fallback, but batch operations need throttling
- Steps nav: horizontal scrollable top bar (not bottom tab bar) — user preference
- **Mobile touch**: hover-only UI (opacity-0 group-hover:opacity-100) doesn't work on touch devices.
  Delete buttons on guide images must be always-visible (red circle badge at -top-1 -right-1), not hover overlays.
- **No duplicate controls in "advanced" sections**: Alex explicitly rejected accordion patterns that duplicate
  prompt/guide-images inside an advanced panel. Advanced = just a model selector dropdown, reusing existing data.
- **Prompt staleness after entity edits**: if a user edits prompt-driving fields (appearance, set description, prop description, mood, etc.), previously auto-generated `reference_image_prompt` values can become stale. On save, invalidate auto-generated prompts (but preserve `[ADVANCED:...]` and `[CUSTOM_UPLOAD]` prompts) so later regeneration rebuilds from current structured fields.
- **Frame/reference reset UX matters**: when users clear a reference image or frame, only null the generated asset URL (`reference_image_url`, `first_frame_url`, `last_frame_url`). Do not delete guide images or prompts. Clearing is for resetting generation state, not deleting upstream inputs.

## Hermes automation API workflow
- Prefer a small authenticated local API bridge for Hermes-driven operation instead of browser-driving the app.
- Useful endpoints:
  - `POST /api/hermes/projects/create-and-run`
  - `GET /api/hermes/projects/:id/status`
  - `GET /api/hermes/projects/:id/export`
- The Hermes task is not complete when the pipeline finishes server-side; it is complete only after Hermes fetches the export and delivers the final video back to the user/chat. Alex explicitly wants Video Story generations posted back into the current chat whenever he asks for generation. Do not leave the result only in the app, and do not make a cron/watchdog the only delivery path when the user is waiting in-chat; manually verify logs/status/export files and send the finished MP4 with `MEDIA:/absolute/final_video.mp4`.
- If Alex says he does not see a video, or the prior response only sent a brief/contact sheet/reference sheet, treat that as a workflow correction and immediately produce or retrieve an MP4. A brief can accompany the work, but it is not a substitute for the deliverable. When he asks for 9:16, verify the delivered MP4 is actually vertical (e.g. `1080x1920`) before sending.
- If starting long Video Story renders from chat, it is acceptable to create a `no_agent` delivery watcher as a backup, but make it idempotent and quiet: store delivered project IDs in `~/.hermes/state`, print only when an export is ready or a project has a meaningful error, include `MEDIA:/absolute/final_video.mp4`, and use a relative script name under `~/.hermes/scripts/` when creating the cron job. If Alex asks to stop a watcher, remove the cron first, then continue manual status/export checks.
- After delivery-worthy renders, run a grounded video-watch QA pass when quality/story correctness matters: ffprobe duration, extract timestamped frames/contact sheets, inspect key frames, and compare against DB scenes/shots/script segments.
- Good defaults that worked here:
  - `aspect_ratio`: `16:9` unless portrait requested
  - `reference_image_model`: `gpt-image-2`
  - `frame_image_model`: `gpt-image-2`

- `references/video-queue-refresh-and-qwen-pro-ref-cap-2026-05-18.md` — session detail for stale video `generating` rows after PM2 restart, `maxConcurrent` status reporting, and the live Fal Qwen Pro Edit 3-reference cap.
- `references/reference-generation-gpt-image-2-fallback-race-2026-06-02.md` — session detail for YOLO reference-image races where GPT Image 2 jobs are still active but the no-progress monitor prematurely invokes fal.ai FLUX fallback, causing logs/UI provenance to disagree.
- `references/stale-export-assembling-after-restart-2026-06-02.md` — session detail for exports stuck on “assembling video” after PM2/server restart: verify no live ffmpeg job, confirm completed clips/beats, reset stale `projects.status='assembling'` to `videos`, rerun `/api/projects/:id/assemble`, then ffprobe the final MP4 before sending.
- `references/hermes-creative-video-brief-draft-bridge-2026-05-30.md` — detailed Creative→Video Story draft bridge plan: token-protected draft import endpoint, slot mapping, provenance tables, idempotency, and draft-only UI/safety rules.
- `references/hermes-creative-bridge-media-url-aspect-2026-05-31.md` — hardening follow-up: Creative media URLs must be absolute `https://hermes-creative.apps.poofc.com/media/...` before Video Story uses them, Video Story defensively prefixes `/media/...`, reel/short/story/tiktok channel strings infer `9:16`, and existing bad imports need SQLite backfill across import/reference tables.
- `references/hermes-creative-video-bridge-live-audit-2026-05-31.md` — live audit of the Creative→Video Story integration: verified draft-only handoff, DB/API inspection snippets, relative `/media/...` URL pitfall, and slash-style reel aspect-ratio inference pitfall.
- Progress/state polling should drive ALL major tabs/components, not just analysis, or the UI goes stale during long runs.
- Save prompt fields needed for retries BEFORE generation starts; do not only persist them on success.
- Persist asset provenance for every generated/imported/uploaded reference, shot frame, and video: full prompt/source note, provider, model label, and source type. Timeline, Images → References, Images → Shot Frames, and entity detail panels should all expose this inspectability. See `references/provenance-inspection-metadata-2026-05-31.md`.
- For prompt-chain debugging, use the reusable pipeline-inspector pattern from `app-pipeline-inspector`: add append-only `pipeline_trace_events`, expose `/api/projects/:id/pipeline-trace?entity_type=...&entity_id=...`, and provide contextual `↕ Pipeline` buttons beside existing prompt/model inspection surfaces. In video-story specifically, high-value entity scopes are `character`/`set`/`prop` reference images, `shot_frame` with `frame_type=first|last`, and `video_clip` for the generated shot video. Capture exact input → prompt → model/tool → decision → output snapshots at generation time, with legacy fallback snapshots for old rows. See `references/pipeline-inspector-video-story-2026-05-31.md`.
- GPT Image 2 is the default image generator through Hermes OpenAI/Codex OAuth. Keep legacy Qwen/Fal and FLUX/Replicate paths available as alternatives, but route default reference/frame generation through `gpt-image-2` and persist provider/model provenance. See `references/gpt-image-2-codex-default-2026-05-31.md`.
- Retry flow should switch providers intelligently for safety/moderation failures instead of hammering the same provider again.
- For frame retries, reuse the same ranked reference bundle as the primary path; do not degrade to weaker text-only retries if continuity refs exist.
- **Frame/reference fallback race pitfall:** do not treat “no new completed frame/reference rows for 30s” as failure while the background GPT Image 2 worker is still active. GPT Image 2 can run longer than that with no DB update. Expose/consult active worker state for both frames and references, block duplicate batches for the same project, and retry through the selected project image model first before fal.ai/FLUX/Qwen safety fallbacks. If a fallback provider actually wins, persist the real provider/model on the row. For references specifically, fallback queries must respect `reference_priority` and not generate `prompt_only` props by accident. See `references/frame-generation-active-worker-fallback-race-2026-05-31.md` and `references/reference-generation-gpt-image-2-fallback-race-2026-06-02.md`.
- Lip-sync stalls may actually be post-provider finalization hangs (download/upload/final save), not the external model itself still running.
- Export stalls after PM2/server restart can be stale DB state, not an active assembly. Because export progress is in-memory, `projects.status='assembling'` may survive while `progressStore` is empty, making `/export-progress` report “Starting assembly...” forever. Verify no ffmpeg process exists, confirm clips/beats are complete, reset the project to `videos`, call `/api/projects/:id/assemble`, and ffprobe the final MP4 before sending. See `references/stale-export-assembling-after-restart-2026-06-02.md`.
- Keep export non-blocking with `COALESCE(lipsync_url, video_url)` so one failed lipsync does not destroy the whole pipeline.