screenshot-knowledge-base

/home/avalon/.hermes/skills/software-development/screenshot-knowledge-base/SKILL.md · raw

Screenshot Knowledge Base Builder

Build a structured knowledge base from UI screenshots — download, analyze, rename, document.

When to Use

Critical: Know Which App You're Working On

Before starting, confirm EXACTLY which app/project the screenshots belong to. If the user has multiple apps (e.g., admin dashboard vs mobile client), get this locked in first. The screenshots go in THAT app's repo, not a different one.

Step 1: Download Screenshots

From MEGA folders

megadl is installed on the VPS and works great:

mkdir -p /path/to/app/public/kb/screenshots/
megadl --path /path/to/app/public/kb/screenshots/ "https://mega.nz/folder/XXXXX#YYYYYY"

DO NOT use the browser tool to download from MEGA — Browserbase downloads go to its cloud, not the VPS filesystem. Always use megadl CLI.

From URLs

for url in URL1 URL2 URL3; do
  curl -L -o /path/to/screenshots/$(basename "$url") "$url"
done

Step 2: Rename Files (CRITICAL)

Files with spaces in names BREAK vision_analyze in subagents. Always rename immediately after download:

from hermes_tools import terminal

dir_path = "/path/to/screenshots"
result = terminal(f'ls -1 "{dir_path}"')
files = [f for f in result["output"].strip().split("\n") if f.endswith(".png")]

for i, old_name in enumerate(sorted(files), 1):
    new_name = f"screen-{i:02d}.png"
    terminal(f'mv "{dir_path}/{old_name}" "{dir_path}/{new_name}"')

Use a descriptive prefix matching the context (e.g., admin-, mobile-, pos-, checkin-).

Step 3: Analyze with Vision (Parallel Batches)

Use delegate_task with 3 parallel subagents, splitting screenshots evenly (~7 per batch). Each subagent: - Calls vision_analyze on each image individually - Documents: screen purpose, every UI element, exact text/labels, navigation, data fields, buttons, tabs - Writes findings to a temp file (/tmp/kb-batch-N.md)

Template for subagent goal:

Analyze these N screenshots one by one using vision_analyze. For EACH image provide:
1) Screen name/purpose
2) Every UI element visible (sidebar, header, tabs, buttons, tables, forms, data fields)
3) Exact text/labels visible
4) How a user/staff member would use this screen
Write complete findings to /tmp/kb-batch-N.md

Files:
1. /path/to/screen-01.png
2. /path/to/screen-02.png
...

Context field must include: What app this is, who uses it, what screens are expected.

Step 4: Compile Knowledge Base

Read all batch files and combine into a single structured KB document:

from hermes_tools import read_file, write_file

b1 = read_file("/tmp/kb-batch-1.md", limit=2000)["content"]
b2 = read_file("/tmp/kb-batch-2.md", limit=2000)["content"]
b3 = read_file("/tmp/kb-batch-3.md", limit=2000)["content"]

write_file("/path/to/app/public/kb/KNOWLEDGE-BASE.md", f"""# APP NAME — Knowledge Base
## Source: {N} screenshots analyzed {date}
---

{b1}

{b2}

{b3}

---

## QUICK REFERENCE — FILE TO SCREEN MAPPING

| File | Screen | Key Purpose |
|------|--------|-------------|
| screen-01.png | ... | ... |
...
""")

Step 5: Commit & Push

cd /path/to/app
git add public/kb/
git commit -m "Add N screenshots + knowledge base analysis"
git push

Pitfalls

  1. Spaces in filenamesvision_analyze in subagents (delegate_task) fails with "Invalid image source" for paths containing spaces. ALWAYS rename files first (Step 2). This cost a full wasted subagent cycle when discovered.

  2. MEGA browser download doesn't work — Browserbase downloads go to its own cloud filesystem, not the VPS. Use megadl CLI instead. It handles the decryption and downloads directly to the specified path.

  3. Subagent vision_analyze path encoding — Even URL-encoding spaces (%20), file:// protocol, and quoting don't fix the spaces issue in subagents. The only fix is renaming the files.

  4. Don't confuse apps — this is the #1 mistake — If the user has multiple apps (admin vs mobile, dashboard vs client), do NOT assume. The user will be frustrated if you mix them up. Read existing KB/screenshot locations in ALL their repos before asking. If the user says "the app" without specifying, check which app already has a KB, which one the screenshots are about (look at the content — admin screens vs client screens are visually different), and confirm. In this project, Alex has jungle-studio-dashboard (ADMIN, for staff) and jungle-studio-mobile (CLIENT, for customers). The 50 existing admin KB screenshots are in the dashboard repo at public/kb/, while the 16 client reference screenshots are in the mobile repo at docs/reference-screenshots/. Mixing these up wastes an entire analysis cycle and annoys the user.

  5. Large batches — More than ~7-8 images per subagent risks hitting timeout or token limits. Split into batches of 6-7.

  6. Existing KB — Check if a knowledge base document already exists before creating a new one. May need to append/update rather than overwrite. Use patch for targeted updates to existing KB files.

  7. Git large files — 20 PNG screenshots (~500KB each = 10MB) is fine for git. If >50MB total, consider git-lfs or storing images separately.

  8. PWA cache prevents updates — If the app is a PWA with a service worker, deploying new code won't show up for users until the SW cache is busted. When updating a KB that drives app UI: bump the SW cache name (e.g., jungle-v1 -> jungle-v2), switch from cache-first to network-first strategy, and serve sw.js with Cache-Control: no-cache headers. Otherwise users will keep seeing the old app no matter how many times they refresh.