fal-replicate-model-inventory

/home/avalon/.hermes/skills/research/fal-replicate-model-inventory/SKILL.md · raw

fal.ai + Replicate + Venice AI Model Inventory

Maintain a reusable inventory of image and video generation models across fal.ai, Replicate, and Venice AI.

Use this skill when: - the user asks what models are available on fal.ai, Replicate, or Venice AI - the user wants current API pricing for image/video models - the user is choosing between providers for a new app - the user wants to update a model picker/config like fal-studio - you need a grounded comparison of image vs video models, edit models, FLF/first-last-frame models, or provider-specific endpoints

Grounding policy

Always separate findings into two buckets: 1. Live-verified now 2. Historical/session-recalled notes

Do not present session-recalled prices as freshly verified. Label them clearly.

Preferred live-check workflow

1) If web_search/web_extract are available and funded

Use them first for quick retrieval.

2) If web_search/web_extract fail due credits or scraping limits

Use browser tools instead.

Recommended live sources: - https://fal.ai/pricing - https://replicate.com/pricing - https://docs.venice.ai/overview/pricing - https://api.venice.ai/api/v1/models (currently text-only in the public unauthenticated response; use docs pricing for image/video tables) - provider model pages linked from those pricing pages

For Venice AI API pricing: 1. Fetch https://docs.venice.ai/overview/pricing and extract HTML tables. The image tables include fixed prices; the video table often says Variable. 2. Use POST https://api.venice.ai/api/v1/video/quote for live video pricing. It can return quotes without auth for many public pricing inputs; if auth becomes required, use docs/table extraction and label it. 3. Quote body pattern: json {"model":"seedance-2-0-text-to-video","duration":"5s","aspect_ratio":"16:9","resolution":"720p","audio":true} 4. Normalize video prices to $/second and also report the actual quoted duration, because Venice durations are model-specific enums.

Reference note: see references/venice-ai-api-pricing-2026.md for a live May 2026 comparison snapshot and reusable quote commands.

Reference note: see references/venice-reference-capabilities-2026.md for a live May 2026 snapshot comparing Venice, fal, and Replicate on image-edit references, video R2V references, private/uncensored model posture, and accepted clip-duration enums.

For browser extraction: 1. browser_navigate(url) 2. browser_snapshot(full=true) 3. browser_console(expression='document.body.innerText.slice(0,12000)') 4. extract only the visible, grounded rows and links

Current live-verified pricing anchors (captured April 2026)

fal.ai pricing page

Verified directly from fal.ai/pricing using browser tools:

Video: - Wan 2.5 — $0.05 / second - Kling 2.5 Turbo Pro — $0.07 / second - Veo 3 — $0.4 / second - Ovi — $0.2 / video

Image: - Seedream V4 — $0.03 / image - Flux Kontext Pro — $0.04 / image - Nanobanana — $0.0398 / image - Qwen — $0.02 / megapixel

Compute: - H100 — $1.89/hr / $0.0005/s - H200 — $2.10/hr / $0.0006/s - A100 — $0.99/hr / $0.0003/s

Replicate pricing page

Verified directly from replicate.com/pricing using browser tools:

Public model examples: - black-forest-labs/flux-1.1-pro$0.04 / output image - black-forest-labs/flux-dev$0.025 / output image - black-forest-labs/flux-schnell$3.00 / thousand output images - ideogram-ai/ideogram-v3-quality$0.09 / output image - recraft-ai/recraft-v3$0.04 / output image - wavespeedai/wan-2.1-i2v-480p$0.09 / second of output video - wavespeedai/wan-2.1-i2v-720p$0.25 / second of output video

Hardware pricing: - gpu-a100-large$0.001400/sec / $5.04/hr - gpu-h100$0.001525/sec / $5.49/hr - gpu-l40s$0.000975/sec / $3.51/hr - gpu-t4$0.000225/sec / $0.81/hr

Venice AI pricing/docs

Verified May 2026 via direct HTTP fetches of docs.venice.ai/overview/pricing and live POST /api/v1/video/quote calls.

Image examples: - qwen-image$0.01 / image - grok-imagine-image$0.03 / image (private) - flux-2-pro$0.04 / image - qwen-image-2$0.05 / image - seedream-v4 / seedream-v5-lite$0.05 / image - flux-2-max$0.09 / image - qwen-image-2-pro$0.10 / image - nano-banana-2$0.10–$0.19 / image depending resolution - nano-banana-pro$0.18–$0.35 / image depending resolution

Video quote examples (normalize but also report source quote): - kling-2.5-turbo-pro-text-to-video$0.39 / 5s$0.078/s - kling-v3-pro-text-to-video$0.49 / 4s audio off ≈ $0.1225/s; $0.74 / 4s audio on ≈ $0.185/s - seedance-2-0-text-to-video 720p — $0.72 / 4s$0.18/s - seedance-2-0-fast-text-to-video 720p — $0.58 / 4s$0.145/s - veo3.1-fast-text-to-video 720p/1080p — $0.44 / 4s audio off ≈ $0.11/s; $0.66 / 4s audio on ≈ $0.165/s - veo3.1-full-text-to-video 720p/1080p — $0.88 / 4s audio off ≈ $0.22/s; $1.76 / 4s audio on ≈ $0.44/s - wan-2-7-text-to-video$0.55 / 5s at 720p ≈ $0.11/s; $0.70 / 5s at 1080p ≈ $0.14/s - ltx-2-v2-3-fast-text-to-video 1080p — $0.40 / 6s$0.0667/s

Historical/session-recalled model families to check

These came from prior sessions and are useful starting points, but should be re-verified before presenting as current pricing:

fal.ai families

Image: - FLUX: flux-pro/v1.1, flux-pro/v1.1-ultra, flux-2-pro, flux-pro/kontext, flux-pro/kontext/max, flux/dev, flux/dev/image-to-image, flux/schnell, flux-realism, flux-lora, flux-pro/v1/fill - Qwen: qwen-image-edit, qwen-image-2/edit, and sometimes higher tiers/pro variants - Nano Banana: nano-banana, nano-banana-2, nano-banana-2/edit, nano-banana-pro, nano-banana-pro/edit - Ideogram, Recraft, Seedream, Reve, Imagen (availability varies)

Video / FLF candidates: - Wan family - Kling family - Veo family - MiniMax / Hailuo family - Vidu family - Ovi

Venice AI families

Image: - Qwen Image / Qwen Image 2 / Qwen Image 2 Pro - FLUX.2 Pro / FLUX.2 Max - Nano Banana 2 / Nano Banana Pro - Seedream v4/v5, Recraft, Grok Imagine, GPT Image variants

Video: - Kling 2.5 / Kling 3 / Kling O3 - Veo 3 / Veo 3.1 fast/full - Seedance 2.0 / fast / reference-to-video - Wan 2.5 / 2.6 / 2.7 - LTX 2.x, HappyHorse, PixVerse, Runway, Vidu, Ovi, Grok Imagine

Replicate families

Image: - FLUX 1.1 Pro / Dev / Schnell - Recraft - Ideogram - Stable Diffusion 3.5 (availability can change)

Video: - Wan family - Kling family - Veo family - Luma / Ray family - Sora family - Hailuo / MiniMax family - CogVideoX / LTX / Grok depending on current catalog

Updating an app inventory (example: fal-studio)

When maintaining a local picker/config such as src/models.js:

  1. Read the current registry file.
  2. Group entries by family, type, price unit, and whether they support image inputs.
  3. Compare live provider pages against the registry.
  4. Mark entries as: - confirmed current - likely stale - missing from app but available upstream
  5. Only change code after confirming endpoint names and parameter expectations.

For fal-studio, the model registry lives in: - src/models.js

Best source for exact fal.ai input options

For fal.ai model capabilities, prefer the endpoint OpenAPI spec and llms.txt over marketing copy.

Reliable pattern: 1. Open pricing/model page to discover likely endpoint IDs. 2. Pull OpenAPI JSON directly: - https://fal.ai/api/openapi/queue/openapi.json?endpoint_id=ENDPOINT_ID 3. Inspect: - paths[*].post.requestBody.content.application/json.schema - components.schemas.*Input - components.schemas.*Output 4. Optionally read: - https://fal.ai/models/ENDPOINT_ID/llms.txt

Why this matters: - the OpenAPI file gives the exact request fields, required params, enums, defaults, and output shape - llms.txt often adds practical notes like prompt syntax (@Image1, @Image2) and pricing examples

Reusable fal video-model patterns discovered

These endpoint families were confirmed via OpenAPI + llms.txt and are useful for building model-specific UI:

UI design rule for fal-studio-like apps

Do not use one generic “reference image” uploader for every model. Use separate capability-driven sections: - General reference images - First frame - Last frame - Optional special sections such as multi-shot prompts

And drive them from model metadata, not hardcoded string checks.

Recommended capability flags per model: - mediaKind: image or video - supportsPrompt - supportsGeneralImageRefs - generalImageRefField - minGeneralImageRefs - maxGeneralImageRefs - supportsFirstFrame - firstFrameField - supportsLastFrame - lastFrameField - supportsAudioReference - audioField - supportsMultiPrompt - multiPromptField - supportsTailFrame if provider uses a nonstandard end-frame field like tail_image_url

Important implementation note: - first/last frame sections should be distinct from general image reference sections - if a model supports only first frame, show only that section - if it supports general refs (like reference_image_urls), show a multi-upload reference section - if the provider uses nonstandard field names (tail_image_url, first_frame_url, start_image_url), map them explicitly in metadata instead of branching on substring matches in backend code

Output format recommendation

When reporting back to the user, use:

  1. Live-verified now
  2. Historical notes to verify
  3. Recommended models by use case
  4. App inventory gaps / stale entries

Pitfalls

Quick recommendations by use case

Video Story-specific decision rules (live-verified April 2026)

Use these when evaluating models for Alex's video-story app.

1) Fractional-duration video is the gating constraint

Video Story derives clip lengths from audio timing and needs sub-second durations.

Live-verified fit: - replicate.com/lucataco/wan-2.2-first-last-frame shows duration_seconds as a number with minimum 0.5 and maximum 10. - This makes the current Replicate Wan 2.2 first/last-frame model a strong fit for Video Story's timing model.

Live-verified fal limitations: - fal-ai/wan-25-preview/text-to-video and /image-to-video expose duration enums of only 5 or 10 - fal-ai/kling-video/v2.5-turbo/pro/image-to-video exposes only 5 or 10 - fal-ai/veo3.1/first-last-frame-to-video exposes 4s, 6s, 8s - fal-ai/minimax/hailuo-02/standard/image-to-video exposes 6 or 10 - fal-ai/kling-video/o1/standard/image-to-video and o3/standard/image-to-video allow more durations, but still only integer-second enums

Practical rule: - If the app must preserve audio-aligned durations like 0.5s, 1.5s, 2.5s, etc., most current fal video endpoints are a bad direct fit without padding, retiming, or export-time trimming hacks. - For Video Story, treat Replicate Wan 2.2 FLF as the current benchmark unless a newer model is verified to accept numeric fractional durations.

2) Trust OpenAPI over marketing copy for image-input requirements

Live-verified examples: - fal-ai/flux-pro/kontext OpenAPI requires image_url - fal-ai/qwen-image requires only prompt - fal-ai/qwen-image-2/edit and fal-ai/qwen-image-2/pro/edit require prompt + image_urls - fal-ai/nano-banana/edit, nano-banana-2/edit, and nano-banana-pro/edit require prompt + image_urls - fal-ai/flux-2-pro/edit requires prompt + image_urls

Practical rule: - Do not assume a model marketed as "can generate new images from text" is a drop-in text-only endpoint for production. Verify whether the schema actually requires an image field. - For Video Story, flux-pro/kontext is best treated as an edit model, not a pure text-only reference generator.

3) Video Story reference-generation buckets

Separate decisions into two different jobs:

A. Text-only reference generation (characters / sets / props before refs exist)

Live-verified candidates: - fal-ai/qwen-image — text-only, $0.02 / megapixel - fal-ai/bytedance/seedream/v4/text-to-image — text-only, $0.03 / image - fal-ai/nano-banana — text-only, about $0.039 / image - replicate.com/black-forest-labs/flux-1.1-pro — text-only with optional composition guidance, $0.04 / output image - replicate.com/black-forest-labs/flux-2-pro — text-only or multi-ref, pricing is mixed: $0.015 / run + $0.015 / input MP + $0.015 / output MP

B. Reference-guided frame generation / compositing

Live-verified candidates: - replicate.com/black-forest-labs/flux-2-pro — up to 8 reference images on API - fal-ai/flux-2-pro/edit — requires image_urls, priced from $0.03 for first output MP plus extra MP charges - fal-ai/qwen-image-2/edit — requires image_urls with 1-3 images, $0.035 / image - fal-ai/qwen-image-2/pro/edit — requires image_urls with 1-3 images, $0.075 / image - fal-ai/nano-banana/edit — requires image_urls, about $0.039 / image - fal-ai/nano-banana-2/edit — requires image_urls, $0.08 / image base with resolution/thinking surcharges - fal-ai/nano-banana-pro/edit — requires image_urls, $0.15 / image base - fal-ai/flux-pro/kontext / kontext/max — single-image edit flow, not multi-ref compositing

4) Video Story app-audit note: current "qwen" path is a hybrid

The current app implementation is not just "Qwen" in one uniform sense: - text-only references go through fal-ai/qwen-image - reference-guided shots go through fal-ai/qwen-image-2/pro/edit

This matters because: - pricing is different between the text-only and edit legs - ref-count limits are different from some local registry assumptions - quality/capability discussions should distinguish qwen-image from qwen-image-2/pro/edit

5) Video Story model-selection guidance

If the user asks which models are best for Video Story specifically, use this default reasoning: - Best current video fit: Replicate Wan 2.2 FLF, because fractional duration support beats today's fal video enums - Best budget text-only references: fal Qwen Image - Best likely upgrade candidate for pure text-only references: fal Seedream V4 - Best current multi-reference frame candidate on paper: Replicate FLUX.2 Pro (8 refs) or fal Nano Banana 2 / Pro when very high reference count matters - Best single-reference edit model: fal Flux Kontext / Kontext Max - Do not recommend a fal video model as the primary Video Story generator unless its duration API is verified to support numeric fractional timing

Maintenance rule

If you use this skill and discover renamed endpoints, broken pricing assumptions, or new provider pages, patch this skill immediately and update references/inventory.md.