screenshot-knowledge-base

/home/avalon/.hermes/skills/software-development/screenshot-knowledge-base/SKILL.md · raw

Screenshot Knowledge Base Builder

Build a structured knowledge base from UI screenshots — download, analyze, rename, document.

When to Use

User shares screenshots of an app (existing or reference) that need to be cataloged
Building design reference documentation from UI screenshots
Documenting an existing app's screens for redesign/rebuild
User provides a MEGA link, Google Drive link, or set of images to analyze

Critical: Know Which App You're Working On

Before starting, confirm EXACTLY which app/project the screenshots belong to. If the user has multiple apps (e.g., admin dashboard vs mobile client), get this locked in first. The screenshots go in THAT app's repo, not a different one.

Step 1: Download Screenshots

From MEGA folders

megadl is installed on the VPS and works great:

mkdir -p /path/to/app/public/kb/screenshots/
megadl --path /path/to/app/public/kb/screenshots/ "https://mega.nz/folder/XXXXX#YYYYYY"

DO NOT use the browser tool to download from MEGA — Browserbase downloads go to its cloud, not the VPS filesystem. Always use megadl CLI.

From URLs

for url in URL1 URL2 URL3; do
  curl -L -o /path/to/screenshots/$(basename "$url") "$url"
done

Step 2: Rename Files (CRITICAL)

Files with spaces in names BREAK vision_analyze in subagents. Always rename immediately after download:

from hermes_tools import terminal

dir_path = "/path/to/screenshots"
result = terminal(f'ls -1 "{dir_path}"')
files = [f for f in result["output"].strip().split("\n") if f.endswith(".png")]

for i, old_name in enumerate(sorted(files), 1):
    new_name = f"screen-{i:02d}.png"
    terminal(f'mv "{dir_path}/{old_name}" "{dir_path}/{new_name}"')

Use a descriptive prefix matching the context (e.g., admin-, mobile-, pos-, checkin-).

Step 3: Analyze with Vision (Parallel Batches)

Use delegate_task with 3 parallel subagents, splitting screenshots evenly (~7 per batch). Each subagent: - Calls vision_analyze on each image individually - Documents: screen purpose, every UI element, exact text/labels, navigation, data fields, buttons, tabs - Writes findings to a temp file (/tmp/kb-batch-N.md)

Template for subagent goal:

Analyze these N screenshots one by one using vision_analyze. For EACH image provide:
1) Screen name/purpose
2) Every UI element visible (sidebar, header, tabs, buttons, tables, forms, data fields)
3) Exact text/labels visible
4) How a user/staff member would use this screen
Write complete findings to /tmp/kb-batch-N.md

Files:
1. /path/to/screen-01.png
2. /path/to/screen-02.png
...

Context field must include: What app this is, who uses it, what screens are expected.

Step 4: Compile Knowledge Base

Read all batch files and combine into a single structured KB document:

from hermes_tools import read_file, write_file

b1 = read_file("/tmp/kb-batch-1.md", limit=2000)["content"]
b2 = read_file("/tmp/kb-batch-2.md", limit=2000)["content"]
b3 = read_file("/tmp/kb-batch-3.md", limit=2000)["content"]

write_file("/path/to/app/public/kb/KNOWLEDGE-BASE.md", f"""# APP NAME — Knowledge Base
## Source: {N} screenshots analyzed {date}
---

{b1}

{b2}

{b3}

---

## QUICK REFERENCE — FILE TO SCREEN MAPPING

| File | Screen | Key Purpose |
|------|--------|-------------|
| screen-01.png | ... | ... |
...
""")

Step 5: Commit & Push

cd /path/to/app
git add public/kb/
git commit -m "Add N screenshots + knowledge base analysis"
git push

Pitfalls

Spaces in filenames — vision_analyze in subagents (delegate_task) fails with "Invalid image source" for paths containing spaces. ALWAYS rename files first (Step 2). This cost a full wasted subagent cycle when discovered.
MEGA browser download doesn't work — Browserbase downloads go to its own cloud filesystem, not the VPS. Use megadl CLI instead. It handles the decryption and downloads directly to the specified path.
Subagent vision_analyze path encoding — Even URL-encoding spaces (%20), file:// protocol, and quoting don't fix the spaces issue in subagents. The only fix is renaming the files.
Don't confuse apps — this is the #1 mistake — If the user has multiple apps (admin vs mobile, dashboard vs client), do NOT assume. The user will be frustrated if you mix them up. Read existing KB/screenshot locations in ALL their repos before asking. If the user says "the app" without specifying, check which app already has a KB, which one the screenshots are about (look at the content — admin screens vs client screens are visually different), and confirm. In this project, Alex has jungle-studio-dashboard (ADMIN, for staff) and jungle-studio-mobile (CLIENT, for customers). The 50 existing admin KB screenshots are in the dashboard repo at public/kb/, while the 16 client reference screenshots are in the mobile repo at docs/reference-screenshots/. Mixing these up wastes an entire analysis cycle and annoys the user.
Large batches — More than ~7-8 images per subagent risks hitting timeout or token limits. Split into batches of 6-7.
Existing KB — Check if a knowledge base document already exists before creating a new one. May need to append/update rather than overwrite. Use patch for targeted updates to existing KB files.
Git large files — 20 PNG screenshots (~500KB each = 10MB) is fine for git. If >50MB total, consider git-lfs or storing images separately.
PWA cache prevents updates — If the app is a PWA with a service worker, deploying new code won't show up for users until the SW cache is busted. When updating a KB that drives app UI: bump the SW cache name (e.g., jungle-v1 -> jungle-v2), switch from cache-first to network-first strategy, and serve sw.js with Cache-Control: no-cache headers. Otherwise users will keep seeing the old app no matter how many times they refresh.