--- name: fal-replicate-model-inventory description: Track and compare image/video generation models across fal.ai, Replicate, and Venice AI, with a grounded workflow for live pricing checks, endpoint discovery, and app-inventory maintenance. version: 1.0.0 author: Hermes Agent license: MIT metadata: hermes: tags: [fal.ai, Replicate, image models, video models, pricing, model inventory, FLUX, Wan, Kling, Qwen] related_skills: [vps-app-deployment, ai-video-story-pipeline, multi-provider-api-resilience] --- # fal.ai + Replicate + Venice AI Model Inventory Maintain a reusable inventory of image and video generation models across fal.ai, Replicate, and Venice AI. Use this skill when: - the user asks what models are available on fal.ai, Replicate, or Venice AI - the user wants current API pricing for image/video models - the user is choosing between providers for a new app - the user wants to update a model picker/config like `fal-studio` - you need a grounded comparison of image vs video models, edit models, FLF/first-last-frame models, or provider-specific endpoints ## Grounding policy Always separate findings into two buckets: 1. Live-verified now 2. Historical/session-recalled notes Do not present session-recalled prices as freshly verified. Label them clearly. ## Preferred live-check workflow ### 1) If web_search/web_extract are available and funded Use them first for quick retrieval. ### 2) If web_search/web_extract fail due credits or scraping limits Use browser tools instead. Recommended live sources: - `https://fal.ai/pricing` - `https://replicate.com/pricing` - `https://docs.venice.ai/overview/pricing` - `https://api.venice.ai/api/v1/models` (currently text-only in the public unauthenticated response; use docs pricing for image/video tables) - provider model pages linked from those pricing pages For Venice AI API pricing: 1. Fetch `https://docs.venice.ai/overview/pricing` and extract HTML tables. The image tables include fixed prices; the video table often says `Variable`. 2. Use `POST https://api.venice.ai/api/v1/video/quote` for live video pricing. It can return quotes without auth for many public pricing inputs; if auth becomes required, use docs/table extraction and label it. 3. Quote body pattern: ```json {"model":"seedance-2-0-text-to-video","duration":"5s","aspect_ratio":"16:9","resolution":"720p","audio":true} ``` 4. Normalize video prices to `$/second` and also report the actual quoted duration, because Venice durations are model-specific enums. Reference note: see `references/venice-ai-api-pricing-2026.md` for a live May 2026 comparison snapshot and reusable quote commands. Reference note: see `references/venice-reference-capabilities-2026.md` for a live May 2026 snapshot comparing Venice, fal, and Replicate on image-edit references, video R2V references, private/uncensored model posture, and accepted clip-duration enums. For browser extraction: 1. `browser_navigate(url)` 2. `browser_snapshot(full=true)` 3. `browser_console(expression='document.body.innerText.slice(0,12000)')` 4. extract only the visible, grounded rows and links ## Current live-verified pricing anchors (captured April 2026) ### fal.ai pricing page Verified directly from `fal.ai/pricing` using browser tools: Video: - Wan 2.5 — `$0.05 / second` - Kling 2.5 Turbo Pro — `$0.07 / second` - Veo 3 — `$0.4 / second` - Ovi — `$0.2 / video` Image: - Seedream V4 — `$0.03 / image` - Flux Kontext Pro — `$0.04 / image` - Nanobanana — `$0.0398 / image` - Qwen — `$0.02 / megapixel` Compute: - H100 — `$1.89/hr` / `$0.0005/s` - H200 — `$2.10/hr` / `$0.0006/s` - A100 — `$0.99/hr` / `$0.0003/s` ### Replicate pricing page Verified directly from `replicate.com/pricing` using browser tools: Public model examples: - `black-forest-labs/flux-1.1-pro` — `$0.04 / output image` - `black-forest-labs/flux-dev` — `$0.025 / output image` - `black-forest-labs/flux-schnell` — `$3.00 / thousand output images` - `ideogram-ai/ideogram-v3-quality` — `$0.09 / output image` - `recraft-ai/recraft-v3` — `$0.04 / output image` - `wavespeedai/wan-2.1-i2v-480p` — `$0.09 / second of output video` - `wavespeedai/wan-2.1-i2v-720p` — `$0.25 / second of output video` Hardware pricing: - `gpu-a100-large` — `$0.001400/sec` / `$5.04/hr` - `gpu-h100` — `$0.001525/sec` / `$5.49/hr` - `gpu-l40s` — `$0.000975/sec` / `$3.51/hr` - `gpu-t4` — `$0.000225/sec` / `$0.81/hr` ### Venice AI pricing/docs Verified May 2026 via direct HTTP fetches of `docs.venice.ai/overview/pricing` and live `POST /api/v1/video/quote` calls. Image examples: - `qwen-image` — `$0.01 / image` - `grok-imagine-image` — `$0.03 / image` (private) - `flux-2-pro` — `$0.04 / image` - `qwen-image-2` — `$0.05 / image` - `seedream-v4` / `seedream-v5-lite` — `$0.05 / image` - `flux-2-max` — `$0.09 / image` - `qwen-image-2-pro` — `$0.10 / image` - `nano-banana-2` — `$0.10–$0.19 / image` depending resolution - `nano-banana-pro` — `$0.18–$0.35 / image` depending resolution Video quote examples (normalize but also report source quote): - `kling-2.5-turbo-pro-text-to-video` — `$0.39 / 5s` ≈ `$0.078/s` - `kling-v3-pro-text-to-video` — `$0.49 / 4s` audio off ≈ `$0.1225/s`; `$0.74 / 4s` audio on ≈ `$0.185/s` - `seedance-2-0-text-to-video` 720p — `$0.72 / 4s` ≈ `$0.18/s` - `seedance-2-0-fast-text-to-video` 720p — `$0.58 / 4s` ≈ `$0.145/s` - `veo3.1-fast-text-to-video` 720p/1080p — `$0.44 / 4s` audio off ≈ `$0.11/s`; `$0.66 / 4s` audio on ≈ `$0.165/s` - `veo3.1-full-text-to-video` 720p/1080p — `$0.88 / 4s` audio off ≈ `$0.22/s`; `$1.76 / 4s` audio on ≈ `$0.44/s` - `wan-2-7-text-to-video` — `$0.55 / 5s` at 720p ≈ `$0.11/s`; `$0.70 / 5s` at 1080p ≈ `$0.14/s` - `ltx-2-v2-3-fast-text-to-video` 1080p — `$0.40 / 6s` ≈ `$0.0667/s` ## Historical/session-recalled model families to check These came from prior sessions and are useful starting points, but should be re-verified before presenting as current pricing: ### fal.ai families Image: - FLUX: `flux-pro/v1.1`, `flux-pro/v1.1-ultra`, `flux-2-pro`, `flux-pro/kontext`, `flux-pro/kontext/max`, `flux/dev`, `flux/dev/image-to-image`, `flux/schnell`, `flux-realism`, `flux-lora`, `flux-pro/v1/fill` - Qwen: `qwen-image-edit`, `qwen-image-2/edit`, and sometimes higher tiers/pro variants - Nano Banana: `nano-banana`, `nano-banana-2`, `nano-banana-2/edit`, `nano-banana-pro`, `nano-banana-pro/edit` - Ideogram, Recraft, Seedream, Reve, Imagen (availability varies) Video / FLF candidates: - Wan family - Kling family - Veo family - MiniMax / Hailuo family - Vidu family - Ovi ### Venice AI families Image: - Qwen Image / Qwen Image 2 / Qwen Image 2 Pro - FLUX.2 Pro / FLUX.2 Max - Nano Banana 2 / Nano Banana Pro - Seedream v4/v5, Recraft, Grok Imagine, GPT Image variants Video: - Kling 2.5 / Kling 3 / Kling O3 - Veo 3 / Veo 3.1 fast/full - Seedance 2.0 / fast / reference-to-video - Wan 2.5 / 2.6 / 2.7 - LTX 2.x, HappyHorse, PixVerse, Runway, Vidu, Ovi, Grok Imagine ### Replicate families Image: - FLUX 1.1 Pro / Dev / Schnell - Recraft - Ideogram - Stable Diffusion 3.5 (availability can change) Video: - Wan family - Kling family - Veo family - Luma / Ray family - Sora family - Hailuo / MiniMax family - CogVideoX / LTX / Grok depending on current catalog ## Updating an app inventory (example: fal-studio) When maintaining a local picker/config such as `src/models.js`: 1. Read the current registry file. 2. Group entries by family, type, price unit, and whether they support image inputs. 3. Compare live provider pages against the registry. 4. Mark entries as: - confirmed current - likely stale - missing from app but available upstream 5. Only change code after confirming endpoint names and parameter expectations. For `fal-studio`, the model registry lives in: - `src/models.js` ### Best source for exact fal.ai input options For fal.ai model capabilities, prefer the endpoint OpenAPI spec and `llms.txt` over marketing copy. Reliable pattern: 1. Open pricing/model page to discover likely endpoint IDs. 2. Pull OpenAPI JSON directly: - `https://fal.ai/api/openapi/queue/openapi.json?endpoint_id=ENDPOINT_ID` 3. Inspect: - `paths[*].post.requestBody.content.application/json.schema` - `components.schemas.*Input` - `components.schemas.*Output` 4. Optionally read: - `https://fal.ai/models/ENDPOINT_ID/llms.txt` Why this matters: - the OpenAPI file gives the exact request fields, required params, enums, defaults, and output shape - `llms.txt` often adds practical notes like prompt syntax (`@Image1`, `@Image2`) and pricing examples ### Reusable fal video-model patterns discovered These endpoint families were confirmed via OpenAPI + llms.txt and are useful for building model-specific UI: - `fal-ai/wan-25-preview/text-to-video` - prompt-only video - supports: `resolution`, `duration`, `aspect_ratio`, `audio_url`, `negative_prompt`, `seed`, `enable_prompt_expansion`, `enable_safety_checker` - `fal-ai/wan-25-preview/image-to-video` - single first-frame image-to-video - required: `prompt`, `image_url` - also supports `audio_url`, `resolution`, `duration`, `negative_prompt`, `seed`, `enable_prompt_expansion`, `enable_safety_checker` - `fal-ai/wan-flf2v` - dedicated first/last frame video - required: `prompt`, `start_image_url`, `end_image_url` - supports `num_frames`, `frames_per_second`, `resolution`, `aspect_ratio`, `guide_scale`, `num_inference_steps`, `acceleration`, `shift`, `negative_prompt`, `seed` - `fal-ai/kling-video/v2.5-turbo/pro/image-to-video` - first-frame image-to-video with optional tail frame - required: `prompt`, `image_url` - optional end/tail frame field is `tail_image_url` - supports `duration`, `cfg_scale`, `negative_prompt` - `fal-ai/kling-video/o3/standard/image-to-video` - first-frame image-to-video with optional end frame and multi-shot support - required: `image_url` - supports either `prompt` or `multi_prompt` (not both), optional `end_image_url`, `duration`, `generate_audio`, `shot_type` - good candidate for a separate “multi-shot prompts” section in UI - `fal-ai/kling-video/o1/standard/image-to-video` - first-frame with optional last frame - required: `prompt`, `start_image_url` - optional `end_image_url` - `llms.txt` explicitly says prompt can reference frames with `@Image1` and `@Image2` - `fal-ai/veo3.1/image-to-video` - first-frame image-to-video - required: `prompt`, `image_url` - supports `resolution`, `duration`, `aspect_ratio`, `generate_audio`, `negative_prompt`, `auto_fix`, `safety_tolerance`, `seed` - `fal-ai/veo3.1/first-last-frame-to-video` - dedicated first/last frame video - required: `prompt`, `first_frame_url`, `last_frame_url` - supports `resolution`, `duration`, `aspect_ratio`, `generate_audio`, `negative_prompt`, `auto_fix`, `safety_tolerance`, `seed` - `fal-ai/veo3/image-to-video` - similar to Veo 3.1 image-to-video - required: `prompt`, `image_url` - `fal-ai/minimax/hailuo-02/standard/image-to-video` - first-frame image-to-video with optional end frame - required: `prompt`, `image_url` - optional `end_image_url` - supports `duration`, `resolution`, `prompt_optimizer` - `fal-ai/vidu/start-end-to-video` - dedicated first/last frame video - required: `prompt`, `start_image_url`, `end_image_url` - supports `movement_amplitude`, `seed` - `fal-ai/vidu/reference-to-video` - multi-reference image-to-video - required: `prompt`, `reference_image_urls` - supports `aspect_ratio`, `movement_amplitude`, `seed` - this is the clearest confirmed example of a provider model that wants a general multi-image reference section rather than first/last frame fields - `fal-ai/ovi/image-to-video` - single reference image-to-video - required: `prompt`, `image_url` - supports `num_inference_steps`, `negative_prompt`, `audio_negative_prompt`, `seed` ### UI design rule for fal-studio-like apps Do not use one generic “reference image” uploader for every model. Use separate capability-driven sections: - General reference images - First frame - Last frame - Optional special sections such as multi-shot prompts And drive them from model metadata, not hardcoded string checks. Recommended capability flags per model: - `mediaKind`: `image` or `video` - `supportsPrompt` - `supportsGeneralImageRefs` - `generalImageRefField` - `minGeneralImageRefs` - `maxGeneralImageRefs` - `supportsFirstFrame` - `firstFrameField` - `supportsLastFrame` - `lastFrameField` - `supportsAudioReference` - `audioField` - `supportsMultiPrompt` - `multiPromptField` - `supportsTailFrame` if provider uses a nonstandard end-frame field like `tail_image_url` Important implementation note: - first/last frame sections should be distinct from general image reference sections - if a model supports only first frame, show only that section - if it supports general refs (like `reference_image_urls`), show a multi-upload reference section - if the provider uses nonstandard field names (`tail_image_url`, `first_frame_url`, `start_image_url`), map them explicitly in metadata instead of branching on substring matches in backend code ## Output format recommendation When reporting back to the user, use: 1. `Live-verified now` 2. `Historical notes to verify` 3. `Recommended models by use case` 4. `App inventory gaps / stale entries` ## Pitfalls - Pricing pages often show only featured examples, not the full catalog. - fal.ai mixes per-image, per-megapixel, per-second, and per-video billing. - Replicate mixes output-based pricing with hardware-time pricing. - Venice AI image pricing is visible in docs tables, but video pricing is often listed as `Variable`; use the Venice Video Quote API and normalize to `$/second` instead of guessing from the table. - Venice `https://api.venice.ai/api/v1/models` may return only text models when fetched unauthenticated even though docs list image/video models; do not conclude image/video are unavailable from that endpoint alone. - A session-recalled endpoint may have been renamed or removed. - Do not assume `401 Authentication is required` means the user must manually activate a model. In fal-studio testing, the same 401 message appeared across many unrelated fal endpoints when the saved key was invalid/test-placeholder. First verify the saved key against multiple models before blaming model gating. - Browser extraction is currently more reliable than `web_extract` here when Firecrawl credits are exhausted. - fal billing endpoint paths may vary. The initially attempted `https://rest.alpha.fal.ai/account/billing?expand=credits` returned 404 in live testing here, so implement credit lookup with endpoint fallbacks rather than assuming a single fixed path. - For fal.ai model capability discovery, the most reliable source is often the endpoint OpenAPI schema directly: `https://fal.ai/api/openapi/queue/openapi.json?endpoint_id=...`. This exposes the exact request fields, required uploads, enums, and output types even when the marketing page is incomplete. - `llms.txt` pages are useful to recover human-readable pricing notes and prompt conventions (for example Kling O1's `@Image1` / `@Image2` references), but the OpenAPI schema should be treated as the source of truth for app wiring. - Some video endpoints accept optional alternate prompt structures (for example Kling O3 `multi_prompt`) and may not require the normal text prompt when the alternate field is present. UI validation should account for that. - Some “special reference” workflows do not have a unique dedicated API field. Example: Vidu `reference-to-video` uses `reference_image_urls`; a 9-frame storyboard/grid image can still be surfaced as a separate UI section and merged into that array on submit. ## Quick recommendations by use case - Cheapest fast images: FLUX Schnell on either provider family where available - Premium image editing on fal: Flux Kontext / Qwen / Nano Banana edit tiers - Typography/design: Ideogram, Recraft, Qwen, Seedream - Budget video: Wan family - Premium video: Kling / Veo tiers - First-last-frame workflows: Wan, Kling, Veo, Hailuo, Vidu families (verify exact endpoint syntax) ## Video Story-specific decision rules (live-verified April 2026) Use these when evaluating models for Alex's `video-story` app. ### 1) Fractional-duration video is the gating constraint Video Story derives clip lengths from audio timing and needs sub-second durations. Live-verified fit: - `replicate.com/lucataco/wan-2.2-first-last-frame` shows `duration_seconds` as a **number** with minimum **0.5** and maximum **10**. - This makes the current Replicate Wan 2.2 first/last-frame model a strong fit for Video Story's timing model. Live-verified fal limitations: - `fal-ai/wan-25-preview/text-to-video` and `/image-to-video` expose duration enums of only `5` or `10` - `fal-ai/kling-video/v2.5-turbo/pro/image-to-video` exposes only `5` or `10` - `fal-ai/veo3.1/first-last-frame-to-video` exposes `4s`, `6s`, `8s` - `fal-ai/minimax/hailuo-02/standard/image-to-video` exposes `6` or `10` - `fal-ai/kling-video/o1/standard/image-to-video` and `o3/standard/image-to-video` allow more durations, but still only integer-second enums Practical rule: - If the app must preserve audio-aligned durations like `0.5s`, `1.5s`, `2.5s`, etc., most current fal video endpoints are a bad direct fit without padding, retiming, or export-time trimming hacks. - For Video Story, treat **Replicate Wan 2.2 FLF** as the current benchmark unless a newer model is verified to accept numeric fractional durations. ### 2) Trust OpenAPI over marketing copy for image-input requirements Live-verified examples: - `fal-ai/flux-pro/kontext` OpenAPI requires `image_url` - `fal-ai/qwen-image` requires only `prompt` - `fal-ai/qwen-image-2/edit` and `fal-ai/qwen-image-2/pro/edit` require `prompt` + `image_urls` - `fal-ai/nano-banana/edit`, `nano-banana-2/edit`, and `nano-banana-pro/edit` require `prompt` + `image_urls` - `fal-ai/flux-2-pro/edit` requires `prompt` + `image_urls` Practical rule: - Do not assume a model marketed as "can generate new images from text" is a drop-in text-only endpoint for production. Verify whether the schema actually requires an image field. - For Video Story, `flux-pro/kontext` is best treated as an **edit model**, not a pure text-only reference generator. ### 3) Video Story reference-generation buckets Separate decisions into two different jobs: #### A. Text-only reference generation (characters / sets / props before refs exist) Live-verified candidates: - `fal-ai/qwen-image` — text-only, `$0.02 / megapixel` - `fal-ai/bytedance/seedream/v4/text-to-image` — text-only, `$0.03 / image` - `fal-ai/nano-banana` — text-only, about `$0.039 / image` - `replicate.com/black-forest-labs/flux-1.1-pro` — text-only with optional composition guidance, `$0.04 / output image` - `replicate.com/black-forest-labs/flux-2-pro` — text-only or multi-ref, pricing is mixed: `$0.015 / run` + `$0.015 / input MP` + `$0.015 / output MP` #### B. Reference-guided frame generation / compositing Live-verified candidates: - `replicate.com/black-forest-labs/flux-2-pro` — up to **8** reference images on API - `fal-ai/flux-2-pro/edit` — requires `image_urls`, priced from `$0.03` for first output MP plus extra MP charges - `fal-ai/qwen-image-2/edit` — requires `image_urls` with **1-3** images, `$0.035 / image` - `fal-ai/qwen-image-2/pro/edit` — requires `image_urls` with **1-3** images, `$0.075 / image` - `fal-ai/nano-banana/edit` — requires `image_urls`, about `$0.039 / image` - `fal-ai/nano-banana-2/edit` — requires `image_urls`, `$0.08 / image` base with resolution/thinking surcharges - `fal-ai/nano-banana-pro/edit` — requires `image_urls`, `$0.15 / image` base - `fal-ai/flux-pro/kontext` / `kontext/max` — single-image edit flow, not multi-ref compositing ### 4) Video Story app-audit note: current "qwen" path is a hybrid The current app implementation is not just "Qwen" in one uniform sense: - text-only references go through `fal-ai/qwen-image` - reference-guided shots go through `fal-ai/qwen-image-2/pro/edit` This matters because: - pricing is different between the text-only and edit legs - ref-count limits are different from some local registry assumptions - quality/capability discussions should distinguish `qwen-image` from `qwen-image-2/pro/edit` ### 5) Video Story model-selection guidance If the user asks which models are best **for Video Story specifically**, use this default reasoning: - **Best current video fit:** Replicate Wan 2.2 FLF, because fractional duration support beats today's fal video enums - **Best budget text-only references:** fal Qwen Image - **Best likely upgrade candidate for pure text-only references:** fal Seedream V4 - **Best current multi-reference frame candidate on paper:** Replicate FLUX.2 Pro (8 refs) or fal Nano Banana 2 / Pro when very high reference count matters - **Best single-reference edit model:** fal Flux Kontext / Kontext Max - **Do not recommend** a fal video model as the primary Video Story generator unless its duration API is verified to support numeric fractional timing ## Maintenance rule If you use this skill and discover renamed endpoints, broken pricing assumptions, or new provider pages, patch this skill immediately and update `references/inventory.md`.