--- name: screenshot-knowledge-base description: Download screenshots from external sources (MEGA, URLs), analyze every one with vision, and compile into a structured knowledge base document for an app. Covers the full pipeline from acquisition to organized KB. version: 1.0.0 tags: [screenshots, knowledge-base, vision, analysis, documentation, mega] --- # Screenshot Knowledge Base Builder Build a structured knowledge base from UI screenshots — download, analyze, rename, document. ## When to Use - User shares screenshots of an app (existing or reference) that need to be cataloged - Building design reference documentation from UI screenshots - Documenting an existing app's screens for redesign/rebuild - User provides a MEGA link, Google Drive link, or set of images to analyze ## Critical: Know Which App You're Working On Before starting, confirm EXACTLY which app/project the screenshots belong to. If the user has multiple apps (e.g., admin dashboard vs mobile client), get this locked in first. The screenshots go in THAT app's repo, not a different one. ## Step 1: Download Screenshots ### From MEGA folders `megadl` is installed on the VPS and works great: ```bash mkdir -p /path/to/app/public/kb/screenshots/ megadl --path /path/to/app/public/kb/screenshots/ "https://mega.nz/folder/XXXXX#YYYYYY" ``` **DO NOT use the browser tool to download from MEGA** — Browserbase downloads go to its cloud, not the VPS filesystem. Always use `megadl` CLI. ### From URLs ```bash for url in URL1 URL2 URL3; do curl -L -o /path/to/screenshots/$(basename "$url") "$url" done ``` ## Step 2: Rename Files (CRITICAL) **Files with spaces in names BREAK vision_analyze in subagents.** Always rename immediately after download: ```python from hermes_tools import terminal dir_path = "/path/to/screenshots" result = terminal(f'ls -1 "{dir_path}"') files = [f for f in result["output"].strip().split("\n") if f.endswith(".png")] for i, old_name in enumerate(sorted(files), 1): new_name = f"screen-{i:02d}.png" terminal(f'mv "{dir_path}/{old_name}" "{dir_path}/{new_name}"') ``` Use a descriptive prefix matching the context (e.g., `admin-`, `mobile-`, `pos-`, `checkin-`). ## Step 3: Analyze with Vision (Parallel Batches) Use `delegate_task` with 3 parallel subagents, splitting screenshots evenly (~7 per batch). Each subagent: - Calls `vision_analyze` on each image individually - Documents: screen purpose, every UI element, exact text/labels, navigation, data fields, buttons, tabs - Writes findings to a temp file (`/tmp/kb-batch-N.md`) **Template for subagent goal:** ``` Analyze these N screenshots one by one using vision_analyze. For EACH image provide: 1) Screen name/purpose 2) Every UI element visible (sidebar, header, tabs, buttons, tables, forms, data fields) 3) Exact text/labels visible 4) How a user/staff member would use this screen Write complete findings to /tmp/kb-batch-N.md Files: 1. /path/to/screen-01.png 2. /path/to/screen-02.png ... ``` **Context field must include:** What app this is, who uses it, what screens are expected. ## Step 4: Compile Knowledge Base Read all batch files and combine into a single structured KB document: ```python from hermes_tools import read_file, write_file b1 = read_file("/tmp/kb-batch-1.md", limit=2000)["content"] b2 = read_file("/tmp/kb-batch-2.md", limit=2000)["content"] b3 = read_file("/tmp/kb-batch-3.md", limit=2000)["content"] write_file("/path/to/app/public/kb/KNOWLEDGE-BASE.md", f"""# APP NAME — Knowledge Base ## Source: {N} screenshots analyzed {date} --- {b1} {b2} {b3} --- ## QUICK REFERENCE — FILE TO SCREEN MAPPING | File | Screen | Key Purpose | |------|--------|-------------| | screen-01.png | ... | ... | ... """) ``` ## Step 5: Commit & Push ```bash cd /path/to/app git add public/kb/ git commit -m "Add N screenshots + knowledge base analysis" git push ``` ## Pitfalls 1. **Spaces in filenames** — `vision_analyze` in subagents (delegate_task) fails with "Invalid image source" for paths containing spaces. ALWAYS rename files first (Step 2). This cost a full wasted subagent cycle when discovered. 2. **MEGA browser download doesn't work** — Browserbase downloads go to its own cloud filesystem, not the VPS. Use `megadl` CLI instead. It handles the decryption and downloads directly to the specified path. 3. **Subagent vision_analyze path encoding** — Even URL-encoding spaces (`%20`), `file://` protocol, and quoting don't fix the spaces issue in subagents. The only fix is renaming the files. 4. **Don't confuse apps — this is the #1 mistake** — If the user has multiple apps (admin vs mobile, dashboard vs client), do NOT assume. The user will be frustrated if you mix them up. Read existing KB/screenshot locations in ALL their repos before asking. If the user says "the app" without specifying, check which app already has a KB, which one the screenshots are about (look at the content — admin screens vs client screens are visually different), and confirm. In this project, Alex has jungle-studio-dashboard (ADMIN, for staff) and jungle-studio-mobile (CLIENT, for customers). The 50 existing admin KB screenshots are in the dashboard repo at `public/kb/`, while the 16 client reference screenshots are in the mobile repo at `docs/reference-screenshots/`. Mixing these up wastes an entire analysis cycle and annoys the user. 5. **Large batches** — More than ~7-8 images per subagent risks hitting timeout or token limits. Split into batches of 6-7. 6. **Existing KB** — Check if a knowledge base document already exists before creating a new one. May need to append/update rather than overwrite. Use `patch` for targeted updates to existing KB files. 7. **Git large files** — 20 PNG screenshots (~500KB each = 10MB) is fine for git. If >50MB total, consider git-lfs or storing images separately. 8. **PWA cache prevents updates** — If the app is a PWA with a service worker, deploying new code won't show up for users until the SW cache is busted. When updating a KB that drives app UI: bump the SW cache name (e.g., `jungle-v1` -> `jungle-v2`), switch from cache-first to network-first strategy, and serve sw.js with `Cache-Control: no-cache` headers. Otherwise users will keep seeing the old app no matter how many times they refresh.