fal-studio-dynamic-media-inputs

/home/avalon/.hermes/skills/.archive/software-development/fal-studio-dynamic-media-inputs/SKILL.md · raw

fal-studio Dynamic Media Inputs

Use this when updating fal-studio or any similar generation app to support: - text-to-video models - image-to-video models - first/last-frame video models - multi-reference image or video models - special storyboard/grid reference sections - model-specific settings panels derived from real endpoint fields

Core approach

Do NOT hardcode assumptions from memory about a model's inputs.

Instead: 1. Find the endpoint IDs you want to support. 2. Pull the endpoint OpenAPI schema directly from fal.ai. 3. Use the schema as the source of truth for: - required fields - enum choices - default values - whether prompt is required - whether first frame / last frame / general refs are needed - whether output is image or video 4. Use llms.txt only as a supplement for pricing notes, prompt conventions, and human-facing descriptions.

Best discovery workflow

For each fal endpoint:

https://fal.ai/api/openapi/queue/openapi.json?endpoint_id=<ENDPOINT_ID>

Then extract: - request body schema title - required array - properties object - output schema (video, image, output, etc.)

Useful companion page:

https://fal.ai/models/<ENDPOINT_ID>/llms.txt

Use llms.txt for: - human-readable price text - prompt notes - examples like Kling O1's @Image1 / @Image2

Reusable UI model design

Represent each model with: - id - name - family - type - mediaType (image or video) - price / priceUnit - promptRequired - optional alternatePromptParam - params object for the settings form - uploads object for upload sections

Recommended upload section keys: - generalReferences - firstFrame - lastFrame - storyboardReference

Each upload section should include: - label - required - maxFiles - apiParam - help

Important validation rule

The generate button should NOT only check for a non-empty text prompt.

Use logic like: - if promptRequired !== false, require text prompt - if promptRequired === false and model has alternatePromptParam, allow either prompt OR that alternate field - also require all upload sections marked required

This matters for models like: - Kling O3, where multi_prompt can substitute for prompt

Upload handling pattern

Frontend: - store uploads by section key, not by a single imageFile - keep separate previews for each section - render each section independently - show max file count per section

Backend: - use multer.any() instead of upload.single('image') - group files by field name - upload each file to fal CDN - map grouped uploads into the correct request fields

Known upload-to-API mappings discovered

Single general image reference

Map to one of: - image_url

Used by examples like: - Flux Kontext - Flux Dev Img2Img - Qwen Image Edit

Multi general references

Map to: - image_urls - or reference_image_urls

Used by examples like: - Qwen Image 2 Edit - Nano Banana edit models - Vidu reference-to-video

First-frame only video

Map to one of: - image_url - sometimes start_image_url - sometimes first_frame_url

Examples: - Wan 2.5 image-to-video -> image_url - Veo 3 / Veo 3.1 image-to-video -> image_url - Ovi image-to-video -> image_url

First + last frame video

Map to one of: - start_image_url + end_image_url - first_frame_url + last_frame_url - image_url + end_image_url - image_url + tail_image_url

Examples: - Wan FLF2V -> start_image_url, end_image_url - Veo 3.1 FLF -> first_frame_url, last_frame_url - Kling O1 -> start_image_url, optional end_image_url - Kling O3 -> image_url, optional end_image_url - Kling 2.5 Turbo Pro -> image_url, optional tail_image_url - Hailuo 02 -> image_url, optional end_image_url - Vidu start-end -> start_image_url, end_image_url

Special storyboard/grid reference section

If the model doesn't expose a dedicated field, you can still provide a separate UI section and merge that file into the model's normal reference array.

Concrete example discovered: - Vidu reference-to-video uses reference_image_urls - a dedicated storyboardReference section can still be offered in the UI - backend merges generalReferences + storyboardReference into reference_image_urls

Result handling pattern

Do not assume image-only responses.

Extract both image and video outputs: - image candidates: images[0].url, image.url, output.url when content type is image - video candidates: video.url, videos[0].url, output.url when content type is video

Frontend should: - render <img> for image results - render <video controls> for video results - save both to gallery with a resultKind

Gallery should: - support image and video previews - preserve resultKind - use video file extension when saving videos locally

Local verification workflow that worked

  1. Add a tiny utility module for form logic.
  2. Add a lightweight Node test file using node:test.
  3. Verify validation logic before wiring the UI.
  4. Run: - node --test model-utils.test.mjs - npm run build
  5. Verify API status locally: - http://127.0.0.1:4016/api/status
  6. Verify live API status after PM2 restart: - https://fal-studio.apps.poofc.com/api/status

Deployment notes specific to fal-studio

pm2 restart fal-studio

Pitfalls

Good candidate endpoints to support

Grounded examples that worked well for this pattern: - fal-ai/wan-25-preview/text-to-video - fal-ai/wan-25-preview/image-to-video - fal-ai/wan-flf2v - fal-ai/kling-video/v2.5-turbo/pro/image-to-video - fal-ai/kling-video/o3/standard/image-to-video - fal-ai/kling-video/o1/standard/image-to-video - fal-ai/veo3.1/first-last-frame-to-video - fal-ai/veo3.1/image-to-video - fal-ai/veo3/image-to-video - fal-ai/minimax/hailuo-02/standard/image-to-video - fal-ai/vidu/start-end-to-video - fal-ai/vidu/reference-to-video - fal-ai/ovi/image-to-video