---
name: fal-replicate-model-inventory
description: Track and compare image/video generation models across fal.ai, Replicate, and Venice AI, with a grounded workflow for live pricing checks, endpoint discovery, and app-inventory maintenance.
version: 1.1.0
author: Hermes Agent
license: MIT
metadata:
  hermes:
    tags: [fal.ai, Replicate, image models, video models, pricing, model inventory, FLUX, Wan, Kling, Qwen]
    related_skills: [vps-app-deployment, ai-video-story-pipeline, multi-provider-api-resilience]
---

# fal.ai + Replicate + Venice AI Model Inventory

Maintain a reusable inventory of image and video generation models across fal.ai, Replicate, and Venice AI.

Use this skill when:
- the user asks what models are available on fal.ai, Replicate, or Venice AI
- the user wants current API pricing for image/video models
- the user is choosing between providers for a new app
- the user wants to update a model picker/config like `fal-studio`
- you need a grounded comparison of image vs video models, edit models, FLF/first-last-frame models, or provider-specific endpoints

## Grounding policy

Always separate findings into two buckets:
1. Live-verified now
2. Historical/session-recalled notes

Do not present session-recalled prices as freshly verified. Label them clearly.

### Registry coverage gate

Before treating Hermes Model Intelligence or any shared registry as “latest,” verify **provider-scoped** coverage and freshness—not only the global sync timestamp. Confirm that each advertised provider has a real ingestion adapter, nonzero provider inventory, a recent completed sync, and visible sync errors/staleness. A fresh fal sync does not make Venice fresh; `provider: venice -> 0` may mean an ingestion gap rather than no Venice models.

For Venice, reconcile the authenticated image/video catalogs with the full documentation export, family guides, and live quote API. Venice `/models` can omit documented and callable deployments, so never make an absence claim from that endpoint alone. Preserve model-level provenance, evidence strength, contradictions, disappearance/reappearance, and payment/operational metadata.

Use `references/provider-coverage-and-venice-reconciliation.md` for the provider-freshness gate, Venice multi-source pattern, creative-capability normalization, x402 metadata, and end-to-end verification checklist.

## Release chronology and recency bias

When inventory data drives model selection, keep provider publication, first observation, and last verification as separate timestamps. Never treat a bulk import's `discovered_at` as a model release date.

Default chronological views to newest source-dated deployments, but keep recency subordinate to hard capability fit and execution evidence. State the release date kind and source in recommendations; `provider_published` means availability on that provider, not automatically the upstream lab announcement.

Use `references/release-chronology.md` for the embedded-provider-metadata extraction pattern, provenance contract, recency bands, agent-facing context requirements, change logging, and end-to-end verification checklist.

## Preferred live-check workflow

### 1) If web_search/web_extract are available and funded
Use them first for quick retrieval.

### 2) If web_search/web_extract fail due credits or scraping limits
Use browser tools instead.

Recommended live sources:
- `https://fal.ai/pricing`
- `https://replicate.com/pricing`
- `https://docs.venice.ai/overview/pricing`
- `https://api.venice.ai/api/v1/models` (currently text-only in the public unauthenticated response; use docs pricing for image/video tables)
- provider model pages linked from those pricing pages

For Venice AI API pricing:
1. Fetch `https://docs.venice.ai/overview/pricing` and extract HTML tables. The image tables include fixed prices; the video table often says `Variable`.
2. Use `POST https://api.venice.ai/api/v1/video/quote` for live video pricing. It can return quotes without auth for many public pricing inputs; if auth becomes required, use docs/table extraction and label it.
3. Quote body pattern:
   ```json
   {"model":"seedance-2-0-text-to-video","duration":"5s","aspect_ratio":"16:9","resolution":"720p","audio":true}
   ```
4. Normalize video prices to `$/second` and also report the actual quoted duration, because Venice durations are model-specific enums.

Reference note: see `references/venice-ai-api-pricing-2026.md` for a live May 2026 comparison snapshot and reusable quote commands.

Reference note: see `references/venice-reference-capabilities-2026.md` for a live May 2026 snapshot comparing Venice, fal, and Replicate on image-edit references, video R2V references, private/uncensored model posture, and accepted clip-duration enums.

Reference note: see `references/provider-parameter-surface-comparison.md` for the July 2026 cross-provider parameter audit, normalized capability contract, and verified differences for GPT Image 2, Nano Banana Pro, Recraft V4, Ideogram V4, Grok, Flux, Kling O3, Veo 3.1, LTX 2.3, Wan 2.7, and Seedance 2.0.

Reference note: see `references/higgsfield-model-surface-pricing-audit.md` for the July 2026 Higgsfield Gemini Omni Flash snapshot and a reusable workflow for auditing credit-based creative platforms by reconciling the live form, frontend validation/cost modules, current plan table, marketing claims, and upstream model docs. Use this pattern when a provider's landing page advertises capabilities that its actual request contract does not expose.

Reference note: see `references/gemini-omni-provider-surface-2026-07.md` for the direct Google vs fal vs Venice vs Higgsfield Omni contract and price comparison, including the crucial distinction between native `previous_interaction_id` state, stateless output-chaining, and platform-advertised conversation.

Reference note: see `references/grok-imagine-vs-seedance-audio-refs-2026.md` for live June 2026 fal schema/pricing notes comparing Grok Imagine Video 1.5 and Seedance 2.0 audio-reference support, including a successful Grok 15s talking-portrait request pattern.

Reference note: see `references/fal-transparent-video-background-removal-2026.md` for fal video-background-removal endpoint schemas and a transparent WebM workflow: generate/review a high-quality MP4 first, then postprocess with Bria/Veed background removal rather than relying on I2V alpha output or local keying hacks.

Reference note: see `references/audio-driven-talking-video.md` when the supplied or named-voice recording must remain the actual speaking performance. It distinguishes audio conditioning from true avatar/lip-sync contracts, covers exact-audio remuxing and Cartesia named-voice lookup, and requires source-aware voice correction: rerun a still portrait directly through the original audio-driven avatar model when quality matters; reserve post-hoc video lipsync for pre-existing moving footage or cases where preserving source motion outweighs facial-synthesis quality.

For browser extraction:
1. `browser_navigate(url)`
2. `browser_snapshot(full=true)`
3. `browser_console(expression='document.body.innerText.slice(0,12000)')`
4. extract only the visible, grounded rows and links

## Current live-verified pricing anchors (captured April 2026)

### fal.ai pricing page
Verified directly from `fal.ai/pricing` using browser tools:

Video:
- Wan 2.5 — `$0.05 / second`
- Kling 2.5 Turbo Pro — `$0.07 / second`
- Veo 3 — `$0.4 / second`
- Ovi — `$0.2 / video`

Image:
- Seedream V4 — `$0.03 / image`
- Flux Kontext Pro — `$0.04 / image`
- Nanobanana — `$0.0398 / image`
- Qwen — `$0.02 / megapixel`

Compute:
- H100 — `$1.89/hr` / `$0.0005/s`
- H200 — `$2.10/hr` / `$0.0006/s`
- A100 — `$0.99/hr` / `$0.0003/s`

### Replicate pricing page
Verified directly from `replicate.com/pricing` using browser tools:

Public model examples:
- `black-forest-labs/flux-1.1-pro` — `$0.04 / output image`
- `black-forest-labs/flux-dev` — `$0.025 / output image`
- `black-forest-labs/flux-schnell` — `$3.00 / thousand output images`
- `ideogram-ai/ideogram-v3-quality` — `$0.09 / output image`
- `recraft-ai/recraft-v3` — `$0.04 / output image`
- `wavespeedai/wan-2.1-i2v-480p` — `$0.09 / second of output video`
- `wavespeedai/wan-2.1-i2v-720p` — `$0.25 / second of output video`

Hardware pricing:
- `gpu-a100-large` — `$0.001400/sec` / `$5.04/hr`
- `gpu-h100` — `$0.001525/sec` / `$5.49/hr`
- `gpu-l40s` — `$0.000975/sec` / `$3.51/hr`
- `gpu-t4` — `$0.000225/sec` / `$0.81/hr`

### Venice AI pricing/docs
Verified May 2026 via direct HTTP fetches of `docs.venice.ai/overview/pricing` and live `POST /api/v1/video/quote` calls.

Image examples:
- `qwen-image` — `$0.01 / image`
- `grok-imagine-image` — `$0.03 / image` (private)
- `flux-2-pro` — `$0.04 / image`
- `qwen-image-2` — `$0.05 / image`
- `seedream-v4` / `seedream-v5-lite` — `$0.05 / image`
- `flux-2-max` — `$0.09 / image`
- `qwen-image-2-pro` — `$0.10 / image`
- `nano-banana-2` — `$0.10–$0.19 / image` depending resolution
- `nano-banana-pro` — `$0.18–$0.35 / image` depending resolution

Video quote examples (normalize but also report source quote):
- `kling-2.5-turbo-pro-text-to-video` — `$0.39 / 5s` ≈ `$0.078/s`
- `kling-v3-pro-text-to-video` — `$0.49 / 4s` audio off ≈ `$0.1225/s`; `$0.74 / 4s` audio on ≈ `$0.185/s`
- `seedance-2-0-text-to-video` 720p — `$0.72 / 4s` ≈ `$0.18/s`
- `seedance-2-0-fast-text-to-video` 720p — `$0.58 / 4s` ≈ `$0.145/s`
- `veo3.1-fast-text-to-video` 720p/1080p — `$0.44 / 4s` audio off ≈ `$0.11/s`; `$0.66 / 4s` audio on ≈ `$0.165/s`
- `veo3.1-full-text-to-video` 720p/1080p — `$0.88 / 4s` audio off ≈ `$0.22/s`; `$1.76 / 4s` audio on ≈ `$0.44/s`
- `wan-2-7-text-to-video` — `$0.55 / 5s` at 720p ≈ `$0.11/s`; `$0.70 / 5s` at 1080p ≈ `$0.14/s`
- `ltx-2-v2-3-fast-text-to-video` 1080p — `$0.40 / 6s` ≈ `$0.0667/s`

### Live-verified June 2026 model notes

- fal video background removal / transparent WebM postprocess:
  - `bria/video/background-removal/v3`: required `video_url`; supports `background_color: "Transparent"`, `preserve_audio`, and `output_container_and_codec: "webm_vp9"`. Use this as the preferred postprocess when the production target is transparent WebM alpha from an opaque high-quality I2V MP4.
  - `bria/video/background-removal`: same basic shape as v3; verify current schema before use.
  - `veed/video-background-removal` and `/fast`: required `video_url`; supports `output_codec: "vp9"`, `subject_is_person`, and `refine_foreground_edges`.
  - App-wiring pitfall: these are video-to-video endpoints and expect `video_url`, not `image_url`.
  - Workflow pitfall: most high-quality I2V models output normal MP4 without alpha; generate/review the MP4 first, then run a real video background-removal model for transparent WebM. Do not present local color-keying as equivalent unless labeled fallback.
- fal Kling Video v3 Pro I2V:
  - `fal-ai/kling-video/v3/pro/image-to-video` requires the reference as `start_image_url`, not `image_url`; supports `duration`, `negative_prompt`, `generate_audio`, `shot_type`, and `cfg_scale`.
  - For preserving an existing transparent PNG reference, first composite the PNG onto a plain neutral/warm-white background if the model may hallucinate checkerboard/alpha; then remove that background after I2V.
- fal Grok Imagine Video 1.5:
  - `xai/grok-imagine-video/v1.5/image-to-video` requires `prompt` + `image_url`; supports integer `duration` 1..15 and `resolution` `480p`/`720p`.
  - Verified OpenAPI fields do **not** include `audio_url`, `audio_urls`, or `generate_audio`: Grok generates native audio, but does **not** accept user-provided audio file references on fal.
  - Pricing observed on fal page: `$0.08/s` at 480p, `$0.14/s` at 720p, plus `$0.01` per input image; generated audio included.
  - Practical use: cheap single-image talking/moving portrait where exact audio is not required. For user-provided audio that only needs to guide cinematic motion, consider Seedance 2.0 reference-to-video. When the uploaded recording must remain the actual speech performance, use a dedicated image+audio avatar or video+audio lip-sync endpoint from `references/audio-driven-talking-video.md`; do not treat Seedance reference audio as exact-audio preservation.
- fal Seedance 2.0 audio-reference distinction:
  - `bytedance/seedance-2.0/text-to-video` and `/image-to-video` expose native `generate_audio`, but no user audio-reference upload fields.
  - `bytedance/seedance-2.0/reference-to-video` and `/fast/reference-to-video` expose `audio_urls` alongside `image_urls` and `video_urls`: up to 3 MP3/WAV files, combined duration <= 15s, max 15 MB each; if audio is supplied, at least one image/video reference is required. References are addressed as `@Audio1`, `@Image1`, `@Video1` in the prompt.
  - Pricing observed: fast 720p about `$0.2419/s`; standard 720p about `$0.3034/s`; standard 1080p about `$0.682/s`.
- fal LTX 2.3 Quality endpoints (checked from fal recently-added + OpenAPI/llms.txt):
  - `fal-ai/ltx-2.3-quality/text-to-video`: prompt-only; `num_frames` 9..481, `frames_per_second` 1..60, `resolution` enum/custom, `generate_audio` default true; price `$0.0024075/MP` of generated video data. At 1280×720/24fps ≈ `$0.053/s`; at 1920×1080/24fps ≈ `$0.120/s`.
  - `fal-ai/ltx-2.3-quality/image-to-video`: requires `prompt` + `image_url`; same frame-count contract and price; supports `image_strength`, audio generation, prompt expansion, quality/write-mode controls. Important: verified OpenAPI fields do **not** include `end_image_url`, so this is **not** a direct first+last-frame swap for the 22B Distilled I2V profile in Video Story classic mode; it is a first-frame animation profile unless paired with another reference/video workflow.
  - `fal-ai/ltx-2.3-quality/audio-to-video`: requires `prompt` + `audio_url`, optional `image_url`; `match_audio_length` default true, otherwise `num_frames`; same `$0.0024075/MP`; strong candidate for dialogue/performance modes when local audio exists.
  - `fal-ai/ltx-2.3-quality/reference-video-to-video`: requires `prompt` + `video_url`; `num_frames`/`frames_per_second`, `video_strength`, optional generated audio; same `$0.0024075/MP`; more useful as repair/restyle/continuation than first-pass classic shots.
  - `fal-ai/ltx-2.3-quality/hdr` and `/hdr/lora`: video-to-HDR workflows from `video_url`; HDR LoRA price `$0.0027075/MP` (≈ `$0.060/s` at 720p/24fps, ≈ `$0.135/s` at 1080p/24fps); post/finishing candidate, not a primary story generator.
  - Quality `/lora` variants exist for text/image/audio/HDR; they require `loras` (max 3, up to 3GB each) and are useful for style/brand fine-tunes, not default app profiles until LoRA upload/selection UX exists.
- fal LTX 2.3 Fast endpoints:
  - `fal-ai/ltx-2.3/text-to-video/fast`: 1080p `$0.04/s`, 1440p `$0.08/s`, 2160p `$0.16/s`; duration enum `6,8,10,12,14,16,18,20`; fps enum `24,25,48,50`; `generate_audio` boolean.
  - `fal-ai/ltx-2.3/image-to-video/fast`: 1080p `$0.06/s`, 1440p `$0.12/s`, 2160p `$0.24/s`; requires `image_url` + `prompt`, optional `end_image_url`; same duration/fps enums; `generate_audio` boolean.
- fal LTX 2.3 22B Distilled:
  - `fal-ai/ltx-2.3-22b/distilled/text-to-video`: priced by generated video megapixels, `$0.001205/MP`; supports `num_frames` integer `9..481`, `fps`, `video_size`, audio, prompt expansion, scheduler/acceleration controls. Strong Video Story candidate for frame-precise duration because duration can be derived from `num_frames / fps` rather than integer-second enums.
  - `fal-ai/ltx-2.3-22b/reference-video-to-video`: `$0.001605/MP`; requires `video_url`; optional `audio_url`, `image_url`, `end_image_url`; supports `match_video_length`, `num_frames`, `match_input_fps`, `fps`, strength/guidance controls.
- fal Wan 2.7:
  - `fal-ai/wan/v2.7/text-to-video`: `$0.10/s` 720p, `$0.15/s` 1080p; duration enum integer `2..15`; supports optional `audio_url`; aspect ratios `16:9,9:16,1:1,4:3,3:4`.
  - `fal-ai/wan/v2.7/image-to-video`: same pricing and integer `2..15`; optional `image_url`, `end_image_url`, `video_url`, `audio_url`; can do first-frame, first+last-frame, video continuation, and audio-driven mode.
- Atlas Cloud Wan 2.7:
  - `alibaba/wan-2.7/text-to-video` and `alibaba/wan-2.7/image-to-video` use `POST https://api.atlascloud.ai/api/v1/model/generateVideo`; price shown as from `$0.10/s`; duration `2..15`; 720P/1080P; image-to-video uses `image`, optional `last_image`, optional `audio`; polling result URL can appear under `data.outputs[]`.
- Replicate LTX 2.3:
  - `lightricks/ltx-2.3-pro`: tasks `text_to_video`, `image_to_video`, `audio_to_video`, `retake`, `extend`; optional `last_frame_image`; duration enum `6,8,10`; fps enum `24,25,48,50`; 1080p `$0.08/s`, 2K `$0.16/s`, likely 4K `$0.32/s` from search snippet; native audio default true.
  - `lightricks/ltx-2.3-fast`: pricing snippet shows 1080p `$0.06/s`, 2K `$0.12/s`.

## Historical/session-recalled model families to check

These came from prior sessions and are useful starting points, but should be re-verified before presenting as current pricing:

### fal.ai families
Image:
- FLUX: `flux-pro/v1.1`, `flux-pro/v1.1-ultra`, `flux-2-pro`, `flux-pro/kontext`, `flux-pro/kontext/max`, `flux/dev`, `flux/dev/image-to-image`, `flux/schnell`, `flux-realism`, `flux-lora`, `flux-pro/v1/fill`
- Qwen: `qwen-image-edit`, `qwen-image-2/edit`, and sometimes higher tiers/pro variants
- Nano Banana: `nano-banana`, `nano-banana-2`, `nano-banana-2/edit`, `nano-banana-pro`, `nano-banana-pro/edit`
- Ideogram, Recraft, Seedream, Reve, Imagen (availability varies)

Video / FLF candidates:
- Wan family
- Kling family
- Veo family
- MiniMax / Hailuo family
- Vidu family
- Ovi

### Venice AI families
Image:
- Qwen Image / Qwen Image 2 / Qwen Image 2 Pro
- FLUX.2 Pro / FLUX.2 Max
- Nano Banana 2 / Nano Banana Pro
- Seedream v4/v5, Recraft, Grok Imagine, GPT Image variants

Video:
- Kling 2.5 / Kling 3 / Kling O3
- Veo 3 / Veo 3.1 fast/full
- Seedance 2.0 / fast / reference-to-video
- Wan 2.5 / 2.6 / 2.7
- LTX 2.x, HappyHorse, PixVerse, Runway, Vidu, Ovi, Grok Imagine

### Replicate families
Image:
- FLUX 1.1 Pro / Dev / Schnell
- Recraft
- Ideogram
- Stable Diffusion 3.5 (availability can change)

Video:
- Wan family
- Kling family
- Veo family
- Luma / Ray family
- Sora family
- Hailuo / MiniMax family
- CogVideoX / LTX / Grok depending on current catalog

## Updating an app inventory (example: fal-studio)

When maintaining a local picker/config such as `src/models.js`:

1. Read the current registry file.
2. Group entries by family, type, price unit, and whether they support image inputs.
3. Compare live provider pages against the registry.
4. Mark entries as:
   - confirmed current
   - likely stale
   - missing from app but available upstream
5. Only change code after confirming endpoint names and parameter expectations.

For `fal-studio`, the model registry lives in:
- `src/models.js`

### Best source for exact fal.ai input options
For fal.ai model capabilities, prefer the endpoint OpenAPI spec and `llms.txt` over marketing copy.

Reliable pattern:
1. Open pricing/model page to discover likely endpoint IDs.
2. Pull OpenAPI JSON directly:
   - `https://fal.ai/api/openapi/queue/openapi.json?endpoint_id=ENDPOINT_ID`
3. Inspect:
   - `paths[*].post.requestBody.content.application/json.schema`
   - `components.schemas.*Input`
   - `components.schemas.*Output`
4. Optionally read:
   - `https://fal.ai/models/ENDPOINT_ID/llms.txt`

Why this matters:
- the OpenAPI file gives the exact request fields, required params, enums, defaults, and output shape
- `llms.txt` often adds practical notes like prompt syntax (`@Image1`, `@Image2`) and pricing examples

### Reusable fal video-model patterns discovered
These endpoint families were confirmed via OpenAPI + llms.txt and are useful for building model-specific UI:

- `fal-ai/wan-25-preview/text-to-video`
  - prompt-only video
  - supports: `resolution`, `duration`, `aspect_ratio`, `audio_url`, `negative_prompt`, `seed`, `enable_prompt_expansion`, `enable_safety_checker`

- `fal-ai/wan-25-preview/image-to-video`
  - single first-frame image-to-video
  - required: `prompt`, `image_url`
  - also supports `audio_url`, `resolution`, `duration`, `negative_prompt`, `seed`, `enable_prompt_expansion`, `enable_safety_checker`

- `fal-ai/wan-flf2v`
  - dedicated first/last frame video
  - required: `prompt`, `start_image_url`, `end_image_url`
  - supports `num_frames`, `frames_per_second`, `resolution`, `aspect_ratio`, `guide_scale`, `num_inference_steps`, `acceleration`, `shift`, `negative_prompt`, `seed`

- `fal-ai/kling-video/v2.5-turbo/pro/image-to-video`
  - first-frame image-to-video with optional tail frame
  - required: `prompt`, `image_url`
  - optional end/tail frame field is `tail_image_url`
  - supports `duration`, `cfg_scale`, `negative_prompt`

- `fal-ai/kling-video/o3/standard/image-to-video`
  - first-frame image-to-video with optional end frame and multi-shot support
  - required: `image_url`
  - supports either `prompt` or `multi_prompt` (not both), optional `end_image_url`, `duration`, `generate_audio`, `shot_type`
  - good candidate for a separate “multi-shot prompts” section in UI

- `fal-ai/kling-video/o1/standard/image-to-video`
  - first-frame with optional last frame
  - required: `prompt`, `start_image_url`
  - optional `end_image_url`
  - `llms.txt` explicitly says prompt can reference frames with `@Image1` and `@Image2`

- `fal-ai/veo3.1/image-to-video`
  - first-frame image-to-video
  - required: `prompt`, `image_url`
  - supports `resolution`, `duration`, `aspect_ratio`, `generate_audio`, `negative_prompt`, `auto_fix`, `safety_tolerance`, `seed`

- `fal-ai/veo3.1/first-last-frame-to-video`
  - dedicated first/last frame video
  - required: `prompt`, `first_frame_url`, `last_frame_url`
  - supports `resolution`, `duration`, `aspect_ratio`, `generate_audio`, `negative_prompt`, `auto_fix`, `safety_tolerance`, `seed`

- `fal-ai/veo3/image-to-video`
  - similar to Veo 3.1 image-to-video
  - required: `prompt`, `image_url`

- `fal-ai/minimax/hailuo-02/standard/image-to-video`
  - first-frame image-to-video with optional end frame
  - required: `prompt`, `image_url`
  - optional `end_image_url`
  - supports `duration`, `resolution`, `prompt_optimizer`

- `fal-ai/vidu/start-end-to-video`
  - dedicated first/last frame video
  - required: `prompt`, `start_image_url`, `end_image_url`
  - supports `movement_amplitude`, `seed`

- `fal-ai/vidu/reference-to-video`
  - multi-reference image-to-video
  - required: `prompt`, `reference_image_urls`
  - supports `aspect_ratio`, `movement_amplitude`, `seed`
  - this is the clearest confirmed example of a provider model that wants a general multi-image reference section rather than first/last frame fields

- `fal-ai/ovi/image-to-video`
  - single reference image-to-video
  - required: `prompt`, `image_url`
  - supports `num_inference_steps`, `negative_prompt`, `audio_negative_prompt`, `seed`

### UI design rule for fal-studio-like apps
Do not use one generic “reference image” uploader for every model.
Use separate capability-driven sections:
- General reference images
- First frame
- Last frame
- Optional special sections such as multi-shot prompts

And drive them from model metadata, not hardcoded string checks.

Recommended capability flags per model:
- `mediaKind`: `image` or `video`
- `supportsPrompt`
- `supportsGeneralImageRefs`
- `generalImageRefField`
- `minGeneralImageRefs`
- `maxGeneralImageRefs`
- `supportsFirstFrame`
- `firstFrameField`
- `supportsLastFrame`
- `lastFrameField`
- `supportsAudioReference`
- `audioField`
- `supportsMultiPrompt`
- `multiPromptField`
- `supportsTailFrame` if provider uses a nonstandard end-frame field like `tail_image_url`

Important implementation note:
- first/last frame sections should be distinct from general image reference sections
- if a model supports only first frame, show only that section
- if it supports general refs (like `reference_image_urls`), show a multi-upload reference section
- if the provider uses nonstandard field names (`tail_image_url`, `first_frame_url`, `start_image_url`), map them explicitly in metadata instead of branching on substring matches in backend code

## Output format recommendation

When reporting back to the user, use:

1. `Live-verified now`
2. `Normalized cost comparison` — same operation, duration, resolution, aspect ratio, and audio state
3. `Parameter-surface differences` — reference limits, frame/audio inputs, timing, quality/FPS/seed/safety controls, task modes, privacy, and delivery behavior
4. `Historical notes to verify`
5. `Recommended models by use case` — state whether the recommendation optimizes price, accepted-output cost, controllability, or privacy
6. `App inventory gaps / stale entries`

Do not call two deployments equivalent merely because they share a model-family name. A generic provider API may omit model-native controls available through another provider, while a private or structured-reference deployment may justify a higher price. Use `references/provider-parameter-surface-comparison.md` for the normalized comparison contract.

## Pitfalls

- Pricing pages often show only featured examples, not the full catalog.
- fal.ai mixes per-image, per-megapixel, per-second, and per-video billing.
- Replicate mixes output-based pricing with hardware-time pricing.
- Venice AI image pricing is visible in docs tables, but video pricing is often listed as `Variable`; use the Venice Video Quote API and normalize to `$/second` instead of guessing from the table.
- Venice `https://api.venice.ai/api/v1/models` may return only text models when fetched unauthenticated even though docs list image/video models; do not conclude image/video are unavailable from that endpoint alone.
- A session-recalled endpoint may have been renamed or removed.
- Do not assume `401 Authentication is required` means the user must manually activate a model. In fal-studio testing, the same 401 message appeared across many unrelated fal endpoints when the saved key was invalid/test-placeholder. First verify the saved key against multiple models before blaming model gating.
- Browser extraction is currently more reliable than `web_extract` here when Firecrawl credits are exhausted.
- fal billing endpoint paths may vary. The initially attempted `https://rest.alpha.fal.ai/account/billing?expand=credits` returned 404 in live testing here, so implement credit lookup with endpoint fallbacks rather than assuming a single fixed path.
- For fal.ai model capability discovery, the most reliable source is often the endpoint OpenAPI schema directly: `https://fal.ai/api/openapi/queue/openapi.json?endpoint_id=...`. This exposes the exact request fields, required uploads, enums, and output types even when the marketing page is incomplete.
- `llms.txt` pages are useful to recover human-readable pricing notes and prompt conventions (for example Kling O1's `@Image1` / `@Image2` references), but the OpenAPI schema should be treated as the source of truth for app wiring.
- Some video endpoints accept optional alternate prompt structures (for example Kling O3 `multi_prompt`) and may not require the normal text prompt when the alternate field is present. UI validation should account for that.
- Some “special reference” workflows do not have a unique dedicated API field. Example: Vidu `reference-to-video` uses `reference_image_urls`; a 9-frame storyboard/grid image can still be surfaced as a separate UI section and merged into that array on submit.

## Quick recommendations by use case

- Cheapest fast images: FLUX Schnell on either provider family where available
- Premium image editing on fal: Flux Kontext / Qwen / Nano Banana edit tiers
- Typography/design: Ideogram, Recraft, Qwen, Seedream
- Budget video: Wan family
- Premium video: Kling / Veo tiers
- First-last-frame workflows: Wan, Kling, Veo, Hailuo, Vidu families (verify exact endpoint syntax)

## Video Story-specific decision rules (live-verified April 2026)

Use these when evaluating models for Alex's `video-story` app.

### 1) Fractional-duration video is the gating constraint
Video Story derives clip lengths from audio timing and needs sub-second durations.

Live-verified fit:
- `replicate.com/lucataco/wan-2.2-first-last-frame` shows `duration_seconds` as a **number** with minimum **0.5** and maximum **10**.
- This makes the current Replicate Wan 2.2 first/last-frame model a strong fit for Video Story's timing model.

Live-verified fal limitations:
- `fal-ai/wan-25-preview/text-to-video` and `/image-to-video` expose duration enums of only `5` or `10`
- `fal-ai/kling-video/v2.5-turbo/pro/image-to-video` exposes only `5` or `10`
- `fal-ai/veo3.1/first-last-frame-to-video` exposes `4s`, `6s`, `8s`
- `fal-ai/minimax/hailuo-02/standard/image-to-video` exposes `6` or `10`
- `fal-ai/kling-video/o1/standard/image-to-video` and `o3/standard/image-to-video` allow more durations, but still only integer-second enums

Practical rule:
- If the app must preserve audio-aligned durations like `0.5s`, `1.5s`, `2.5s`, etc., most current fal video endpoints are a bad direct fit without padding, retiming, or export-time trimming hacks.
- For Video Story, treat **Replicate Wan 2.2 FLF** as the current benchmark unless a newer model is verified to accept numeric fractional durations.

### 2) Trust OpenAPI over marketing copy for image-input requirements
Live-verified examples:
- `fal-ai/flux-pro/kontext` OpenAPI requires `image_url`
- `fal-ai/qwen-image` requires only `prompt`
- `fal-ai/qwen-image-2/edit` and `fal-ai/qwen-image-2/pro/edit` require `prompt` + `image_urls`
- `fal-ai/nano-banana/edit`, `nano-banana-2/edit`, and `nano-banana-pro/edit` require `prompt` + `image_urls`
- `fal-ai/flux-2-pro/edit` requires `prompt` + `image_urls`

Practical rule:
- Do not assume a model marketed as "can generate new images from text" is a drop-in text-only endpoint for production. Verify whether the schema actually requires an image field.
- For Video Story, `flux-pro/kontext` is best treated as an **edit model**, not a pure text-only reference generator.

### 3) Video Story reference-generation buckets
Separate decisions into two different jobs:

#### A. Text-only reference generation (characters / sets / props before refs exist)
Live-verified candidates:
- `fal-ai/qwen-image` — text-only, `$0.02 / megapixel`
- `fal-ai/bytedance/seedream/v4/text-to-image` — text-only, `$0.03 / image`
- `fal-ai/nano-banana` — text-only, about `$0.039 / image`
- `replicate.com/black-forest-labs/flux-1.1-pro` — text-only with optional composition guidance, `$0.04 / output image`
- `replicate.com/black-forest-labs/flux-2-pro` — text-only or multi-ref, pricing is mixed: `$0.015 / run` + `$0.015 / input MP` + `$0.015 / output MP`

#### B. Reference-guided frame generation / compositing
Live-verified candidates:
- `replicate.com/black-forest-labs/flux-2-pro` — up to **8** reference images on API
- `fal-ai/flux-2-pro/edit` — requires `image_urls`, priced from `$0.03` for first output MP plus extra MP charges
- `fal-ai/qwen-image-2/edit` — requires `image_urls` with **1-3** images, `$0.035 / image`
- `fal-ai/qwen-image-2/pro/edit` — requires `image_urls` with **1-3** images, `$0.075 / image`
- `fal-ai/nano-banana/edit` — requires `image_urls`, about `$0.039 / image`
- `fal-ai/nano-banana-2/edit` — requires `image_urls`, `$0.08 / image` base with resolution/thinking surcharges
- `fal-ai/nano-banana-pro/edit` — requires `image_urls`, `$0.15 / image` base
- `fal-ai/flux-pro/kontext` / `kontext/max` — single-image edit flow, not multi-ref compositing

### 4) Video Story app-audit note: current "qwen" path is a hybrid
The current app implementation is not just "Qwen" in one uniform sense:
- text-only references go through `fal-ai/qwen-image`
- reference-guided shots go through `fal-ai/qwen-image-2/pro/edit`

This matters because:
- pricing is different between the text-only and edit legs
- ref-count limits are different from some local registry assumptions
- quality/capability discussions should distinguish `qwen-image` from `qwen-image-2/pro/edit`

### 5) Video Story model-selection guidance
If the user asks which models are best **for Video Story specifically**, use this default reasoning:
- **Best current video fit:** Replicate Wan 2.2 FLF, because fractional duration support beats today's fal video enums
- **Best budget text-only references:** fal Qwen Image
- **Best likely upgrade candidate for pure text-only references:** fal Seedream V4
- **Best current multi-reference frame candidate on paper:** Replicate FLUX.2 Pro (8 refs) or fal Nano Banana 2 / Pro when very high reference count matters
- **Best single-reference edit model:** fal Flux Kontext / Kontext Max
- **Do not recommend** a fal video model as the primary Video Story generator unless its duration API is verified to support numeric fractional timing

## Maintenance rule

If you use this skill and discover renamed endpoints, broken pricing assumptions, or new provider pages, patch this skill immediately and update `references/inventory.md`.