fal-studio-dynamic-media-inputs

/home/avalon/.hermes/skills/.archive/software-development/fal-studio-dynamic-media-inputs/SKILL.md · raw

fal-studio Dynamic Media Inputs

Use this when updating fal-studio or any similar generation app to support: - text-to-video models - image-to-video models - first/last-frame video models - multi-reference image or video models - special storyboard/grid reference sections - model-specific settings panels derived from real endpoint fields

Core approach

Do NOT hardcode assumptions from memory about a model's inputs.

Instead: 1. Find the endpoint IDs you want to support. 2. Pull the endpoint OpenAPI schema directly from fal.ai. 3. Use the schema as the source of truth for: - required fields - enum choices - default values - whether prompt is required - whether first frame / last frame / general refs are needed - whether output is image or video 4. Use llms.txt only as a supplement for pricing notes, prompt conventions, and human-facing descriptions.

Best discovery workflow

For each fal endpoint:

https://fal.ai/api/openapi/queue/openapi.json?endpoint_id=<ENDPOINT_ID>

Then extract: - request body schema title - required array - properties object - output schema (video, image, output, etc.)

Useful companion page:

https://fal.ai/models/<ENDPOINT_ID>/llms.txt

Use llms.txt for: - human-readable price text - prompt notes - examples like Kling O1's @Image1 / @Image2

Reusable UI model design

Represent each model with: - id - name - family - type - mediaType (image or video) - price / priceUnit - promptRequired - optional alternatePromptParam - params object for the settings form - uploads object for upload sections

Recommended upload section keys: - generalReferences - firstFrame - lastFrame - storyboardReference

Each upload section should include: - label - required - maxFiles - apiParam - help

Important validation rule

The generate button should NOT only check for a non-empty text prompt.

Use logic like: - if promptRequired !== false, require text prompt - if promptRequired === false and model has alternatePromptParam, allow either prompt OR that alternate field - also require all upload sections marked required

This matters for models like: - Kling O3, where multi_prompt can substitute for prompt

Upload handling pattern

Frontend: - store uploads by section key, not by a single imageFile - keep separate previews for each section - render each section independently - show max file count per section

Backend: - use multer.any() instead of upload.single('image') - group files by field name - upload each file to fal CDN - map grouped uploads into the correct request fields

Known upload-to-API mappings discovered

Single general image reference

Map to one of: - image_url

Used by examples like: - Flux Kontext - Flux Dev Img2Img - Qwen Image Edit

Multi general references

Map to: - image_urls - or reference_image_urls

Used by examples like: - Qwen Image 2 Edit - Nano Banana edit models - Vidu reference-to-video

First-frame only video

Map to one of: - image_url - sometimes start_image_url - sometimes first_frame_url

Examples: - Wan 2.5 image-to-video -> image_url - Veo 3 / Veo 3.1 image-to-video -> image_url - Ovi image-to-video -> image_url

First + last frame video

Map to one of: - start_image_url + end_image_url - first_frame_url + last_frame_url - image_url + end_image_url - image_url + tail_image_url

Examples: - Wan FLF2V -> start_image_url, end_image_url - Veo 3.1 FLF -> first_frame_url, last_frame_url - Kling O1 -> start_image_url, optional end_image_url - Kling O3 -> image_url, optional end_image_url - Kling 2.5 Turbo Pro -> image_url, optional tail_image_url - Hailuo 02 -> image_url, optional end_image_url - Vidu start-end -> start_image_url, end_image_url

Special storyboard/grid reference section

If the model doesn't expose a dedicated field, you can still provide a separate UI section and merge that file into the model's normal reference array.

Concrete example discovered: - Vidu reference-to-video uses reference_image_urls - a dedicated storyboardReference section can still be offered in the UI - backend merges generalReferences + storyboardReference into reference_image_urls

Result handling pattern

Do not assume image-only responses.

Extract both image and video outputs: - image candidates: images[0].url, image.url, output.url when content type is image - video candidates: video.url, videos[0].url, output.url when content type is video

Frontend should: - render <img> for image results - render <video controls> for video results - save both to gallery with a resultKind

Gallery should: - support image and video previews - preserve resultKind - use video file extension when saving videos locally

Local verification workflow that worked

Add a tiny utility module for form logic.
Add a lightweight Node test file using node:test.
Verify validation logic before wiring the UI.
Run: - node --test model-utils.test.mjs - npm run build
Verify API status locally: - http://127.0.0.1:4016/api/status
Verify live API status after PM2 restart: - https://fal-studio.apps.poofc.com/api/status

Deployment notes specific to fal-studio

This app is a Vite + Express single-port app on port 4016.
PM2 process name: fal-studio
Restart with:

pm2 restart fal-studio

Always verify dist timestamps after build.
Always commit and push after changes.

Pitfalls

A browser session authenticated with HTTP Basic auth in the page URL may still produce fetch quirks in browser automation; trust the live /api/status endpoint and PM2 verification more than the browser page alone if they disagree.
Some fal models return 401 because the user's fal account has not activated that model yet. Preserve a friendly activation message in the backend.
upload.single('image') is too limiting for multi-reference and first/last-frame workflows. Use multer.any().
If you keep only one imageFile state in React, you cannot correctly support model-specific upload sections. Replace it with upload state keyed by section.
Result extraction must support both images and videos, or video models will falsely appear broken.

Good candidate endpoints to support

Grounded examples that worked well for this pattern: - fal-ai/wan-25-preview/text-to-video - fal-ai/wan-25-preview/image-to-video - fal-ai/wan-flf2v - fal-ai/kling-video/v2.5-turbo/pro/image-to-video - fal-ai/kling-video/o3/standard/image-to-video - fal-ai/kling-video/o1/standard/image-to-video - fal-ai/veo3.1/first-last-frame-to-video - fal-ai/veo3.1/image-to-video - fal-ai/veo3/image-to-video - fal-ai/minimax/hailuo-02/standard/image-to-video - fal-ai/vidu/start-end-to-video - fal-ai/vidu/reference-to-video - fal-ai/ovi/image-to-video