--- name: llm-wiki description: "Karpathy's LLM Wiki: build/query interlinked markdown KB." version: 2.1.0 author: Hermes Agent license: MIT metadata: hermes: tags: [wiki, knowledge-base, research, notes, markdown, rag-alternative] category: research related_skills: [obsidian, arxiv] --- # Karpathy's LLM Wiki Build and maintain a persistent, compounding knowledge base as interlinked markdown files. Based on [Andrej Karpathy's LLM Wiki pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f). Unlike traditional RAG (which rediscovers knowledge from scratch per query), the wiki compiles knowledge once and keeps it current. Cross-references are already there. Contradictions have already been flagged. Synthesis reflects everything ingested. **Division of labor:** The human curates sources and directs analysis. The agent summarizes, cross-references, files, and maintains consistency. ## When This Skill Activates Use this skill when the user: - Asks to create, build, or start a wiki or knowledge base - Asks to ingest, add, or process a source into their wiki - Asks a question and an existing wiki is present at the configured path - Asks to lint, audit, or health-check their wiki - References their wiki, knowledge base, or "notes" in a research context ## Wiki Location **Location:** Set via `WIKI_PATH` environment variable (e.g. in `~/.hermes/.env`). If unset, defaults to `~/wiki`. ```bash WIKI="${WIKI_PATH:-$HOME/wiki}" ``` The wiki is just a directory of markdown files — open it in Obsidian, VS Code, or any editor. No database, no special tooling required. ## Architecture: Three Layers ``` wiki/ ├── SCHEMA.md # Conventions, structure rules, domain config ├── index.md # Sectioned content catalog with one-line summaries ├── log.md # Chronological action log (append-only, rotated yearly) ├── raw/ # Layer 1: Immutable source material │ ├── articles/ # Web articles, clippings │ ├── papers/ # PDFs, arxiv papers │ ├── transcripts/ # Meeting notes, interviews │ └── assets/ # Images, diagrams referenced by sources ├── entities/ # Layer 2: Entity pages (people, orgs, products, models) ├── concepts/ # Layer 2: Concept/topic pages ├── comparisons/ # Layer 2: Side-by-side analyses └── queries/ # Layer 2: Filed query results worth keeping ``` **Layer 1 — Raw Sources:** Immutable. The agent reads but never modifies these. For large assets, use a hybrid pattern: keep small raw markdown/JSON/manifests local, but store large PDFs, rendered page images, audio, and video in S3-compatible object storage with local `.s3stub` pointer files and a `manifest.json`. **Layer 2 — The Wiki:** Agent-owned markdown files. Created, updated, and cross-referenced by the agent. **Layer 3 — The Schema:** `SCHEMA.md` defines structure, conventions, and tag taxonomy. ## Git-Tracked Obsidian Vault Pattern A strong adaptation of Brad Bonanno's self-updating wiki pattern is to treat the wiki as: - a normal markdown vault - tracked in git - optionally opened in Obsidian - updated by scheduled Hermes jobs that act like "context farmers" Recommended additions for a production wiki repo: ``` wiki/ ├── context/ │ ├── watchlists.md # sources, channels, feeds, competitors, query seeds │ └── farmers/ │ └── .state/ # last-run markers per farmer/job ├── raw/ ├── concepts/ ├── entities/ └── ... ``` Use this pattern when the user wants the wiki to keep growing without manual ingestion every day. ### Why this works - The **human chooses sources** once. - Hermes handles the repetitive fetching, normalization, and filing. - Git becomes the audit log for every wiki update. - Obsidian provides graph/backlinks/frontmatter UX on top of plain markdown. ### Recommended git workflow - Keep the wiki in a **private** repo. - Pull before automated writes when multiple machines or jobs may touch it. - Commit every ingestion batch with a descriptive message. - If the user also opens the vault locally in Obsidian, sync with git or Obsidian Sync. For Hermes, scheduled updates map naturally to `cronjob`, not Claude's cloud scheduler. ## Context Farming with Hermes A "context farmer" is just a recurring Hermes workflow that: 1. reads a watchlist or source config 2. fetches only new material since the last run 3. writes raw source files into `raw/` 4. updates entity/concept pages 5. records the run in `log.md` 6. updates a last-run state file ### Good farmer sources - YouTube channels or playlists - blog/RSS feeds - competitor websites - meeting transcript exports - research feeds and paper alerts - internal documents the user regularly drops into a folder ### Hermes-native equivalents - Claude scheduled agents → `cronjob` - Claude subagents / farmers → Hermes `cronjob` + optional `delegate_task` - MCP source connectors → Hermes tools (`web_extract`, `web_search`, `terminal`, provider CLIs, APIs) ### Minimal farmer state pattern Store last-run timestamps under a repo-local state path such as: ``` context/farmers/.state/youtube-last-run.txt context/farmers/.state/research-last-run.txt ``` These can be gitignored if they are purely operational. ### Suggested watchlist file `context/watchlists.md` should capture what matters, not implementation details. Example sections: - tracked YouTube channels - tracked blogs or RSS feeds - tracked companies / competitors - key topics / keywords - exclusions / noise filters ### Scheduling pattern For recurring wiki updates, create one cron job per source class or one orchestrator job that fans out. Examples: - every morning: ingest tracked YouTube channels - every 2 hours: ingest monitored RSS/blog feeds - every evening: synthesize new raw files into entity/concept pages Use separate jobs when source classes have different failure modes or cadences. ## Resuming an Existing Wiki (CRITICAL — do this every session) When the user has an existing wiki, **always orient yourself before doing anything**: ① **Read `SCHEMA.md`** — understand the domain, conventions, and tag taxonomy. ② **Read `index.md`** — learn what pages exist and their summaries. ③ **Scan recent `log.md`** — read the last 20-30 entries to understand recent activity. ```bash WIKI="${WIKI_PATH:-$HOME/wiki}" # Orientation reads at session start read_file "$WIKI/SCHEMA.md" read_file "$WIKI/index.md" read_file "$WIKI/log.md" offset= ``` Only after orientation should you ingest, query, or lint. This prevents: - Creating duplicate pages for entities that already exist - Missing cross-references to existing content - Contradicting the schema's conventions - Repeating work already logged For large wikis (100+ pages), also run a quick `search_files` for the topic at hand before creating anything new. ## Initializing a New Wiki When the user asks to create or start a wiki: 1. Determine the wiki path (from `$WIKI_PATH` env var, or ask the user; default `~/wiki`) 2. Create the directory structure above 3. Ask the user what domain the wiki covers — be specific 4. Write `SCHEMA.md` customized to the domain (see template below) 5. Write initial `index.md` with sectioned header 6. Write initial `log.md` with creation entry 7. Confirm the wiki is ready and suggest first sources to ingest ### SCHEMA.md Template Adapt to the user's domain. The schema constrains agent behavior and ensures consistency: ```markdown # Wiki Schema ## Domain [What this wiki covers — e.g., "AI/ML research", "personal health", "startup intelligence"] ## Conventions - File names: lowercase, hyphens, no spaces (e.g., `transformer-architecture.md`) - Every wiki page starts with YAML frontmatter (see below) - Use `[[wikilinks]]` to link between pages (minimum 2 outbound links per page) - When updating a page, always bump the `updated` date - Every new page must be added to `index.md` under the correct section - Every action must be appended to `log.md` - **Provenance markers:** On pages that synthesize 3+ sources, append `^[raw/articles/source-file.md]` at the end of paragraphs whose claims come from a specific source. This lets a reader trace each claim back without re-reading the whole raw file. Optional on single-source pages where the `sources:` frontmatter is enough. ## Frontmatter ```yaml --- title: Page Title created: YYYY-MM-DD updated: YYYY-MM-DD type: entity | concept | comparison | query | summary tags: [from taxonomy below] sources: [raw/articles/source-name.md] # Optional quality signals: confidence: high | medium | low # how well-supported the claims are contested: true # set when the page has unresolved contradictions contradictions: [other-page-slug] # pages this one conflicts with --- ``` `confidence` and `contested` are optional but recommended for opinion-heavy or fast-moving topics. Lint surfaces `contested: true` and `confidence: low` pages for review so weak claims don't silently harden into accepted wiki fact. ### raw/ Frontmatter Raw sources ALSO get a small frontmatter block so re-ingests can detect drift: ```yaml --- source_url: https://example.com/article # original URL, if applicable ingested: YYYY-MM-DD sha256: --- ``` The `sha256:` lets a future re-ingest of the same URL skip processing when content is unchanged, and flag drift when it has changed. Compute over the body only (everything after the closing `---`), not the frontmatter itself. ## Tag Taxonomy [Define 10-20 top-level tags for the domain. Add new tags here BEFORE using them.] Example for AI/ML: - Models: model, architecture, benchmark, training - People/Orgs: person, company, lab, open-source - Techniques: optimization, fine-tuning, inference, alignment, data - Meta: comparison, timeline, controversy, prediction Rule: every tag on a page must appear in this taxonomy. If a new tag is needed, add it here first, then use it. This prevents tag sprawl. ## Page Thresholds - **Create a page** when an entity/concept appears in 2+ sources OR is central to one source - **Add to existing page** when a source mentions something already covered - **DON'T create a page** for passing mentions, minor details, or things outside the domain - **Split a page** when it exceeds ~200 lines — break into sub-topics with cross-links - **Archive a page** when its content is fully superseded — move to `_archive/`, remove from index ## Entity Pages One page per notable entity. Include: - Overview / what it is - Key facts and dates - Relationships to other entities ([[wikilinks]]) - Source references ## Concept Pages One page per concept or topic. Include: - Definition / explanation - Current state of knowledge - Open questions or debates - Related concepts ([[wikilinks]]) ## Comparison Pages Side-by-side analyses. Include: - What is being compared and why - Dimensions of comparison (table format preferred) - Verdict or synthesis - Sources ## Update Policy When new information conflicts with existing content: 1. Check the dates — newer sources generally supersede older ones 2. If genuinely contradictory, note both positions with dates and sources 3. Mark the contradiction in frontmatter: `contradictions: [page-name]` 4. Flag for user review in the lint report ``` ### index.md Template The index is sectioned by type. Each entry is one line: wikilink + summary. ```markdown # Wiki Index > Content catalog. Every wiki page listed under its type with a one-line summary. > Read this first to find relevant pages for any query. > Last updated: YYYY-MM-DD | Total pages: N ## Entities ## Concepts ## Comparisons ## Queries ``` **Scaling rule:** When any section exceeds 50 entries, split it into sub-sections by first letter or sub-domain. When the index exceeds 200 entries total, create a `_meta/topic-map.md` that groups pages by theme for faster navigation. ### log.md Template ```markdown # Wiki Log > Chronological record of all wiki actions. Append-only. > Format: `## [YYYY-MM-DD] action | subject` > Actions: ingest, update, query, lint, create, archive, delete > When this file exceeds 500 entries, rotate: rename to log-YYYY.md, start fresh. ## [YYYY-MM-DD] create | Wiki initialized - Domain: [domain] - Structure created with SCHEMA.md, index.md, log.md ``` ## Core Operations ### 1. Ingest When the user provides a source (URL, file, paste), integrate it into the wiki: ① **Capture the raw source:** - URL → use `web_extract` to get markdown, save to `raw/articles/` - PDF → use `web_extract` (handles PDFs), save to `raw/papers/` - Pasted text → save to appropriate `raw/` subdirectory - Name the file descriptively: `raw/articles/karpathy-llm-wiki-2026.md` - **Add raw frontmatter** (`source_url`, `ingested`, `sha256` of the body). On re-ingest of the same URL: recompute the sha256, compare to the stored value — skip if identical, flag drift and update if different. This is cheap enough to do on every re-ingest and catches silent source changes. ② **Discuss takeaways** with the user — what's interesting, what matters for the domain. (Skip this in automated/cron contexts — proceed directly.) ③ **Check what already exists** — search index.md and use `search_files` to find existing pages for mentioned entities/concepts. This is the difference between a growing wiki and a pile of duplicates. ④ **Write or update wiki pages:** - **New entities/concepts:** Create pages only if they meet the Page Thresholds in SCHEMA.md (2+ source mentions, or central to one source) - **Existing pages:** Add new information, update facts, bump `updated` date. When new info contradicts existing content, follow the Update Policy. - **Cross-reference:** Every new or updated page must link to at least 2 other pages via `[[wikilinks]]`. Check that existing pages link back. - **Tags:** Only use tags from the taxonomy in SCHEMA.md - **Provenance:** On pages synthesizing 3+ sources, append `^[raw/articles/source.md]` markers to paragraphs whose claims trace to a specific source. - **Confidence:** For opinion-heavy, fast-moving, or single-source claims, set `confidence: medium` or `low` in frontmatter. Don't mark `high` unless the claim is well-supported across multiple sources. ⑤ **Update navigation:** - Add new pages to `index.md` under the correct section, alphabetically - Update the "Total pages" count and "Last updated" date in index header - Append to `log.md`: `## [YYYY-MM-DD] ingest | Source Title` - List every file created or updated in the log entry ⑥ **Report what changed** — list every file created or updated to the user. A single source can trigger updates across 5-15 wiki pages. This is normal and desired — it's the compounding effect. ### 2. Query When the user asks a question about the wiki's domain: ① **Read `index.md`** to identify relevant pages. ② **For wikis with 100+ pages**, also `search_files` across all `.md` files for key terms — the index alone may miss relevant content. ③ **Read the relevant pages** using `read_file`. ④ **Synthesize an answer** from the compiled knowledge. Cite the wiki pages you drew from: "Based on [[page-a]] and [[page-b]]..." ⑤ **File valuable answers back** — if the answer is a substantial comparison, deep dive, or novel synthesis, create a page in `queries/` or `comparisons/`. Don't file trivial lookups — only answers that would be painful to re-derive. ⑥ **Update log.md** with the query and whether it was filed. ### 3. Lint When the user asks to lint, health-check, audit, or prepare for a large/source-sensitive ingestion, run or create a **repo-local QA gate** before bulk processing. For high-value corpora (lectures, books, PDFs, domain archives), do not treat lint as optional cleanup after the fact: add `scripts/wiki_qa.py` or equivalent first, document it in `docs/QA.md`, run it after every ingestion phase, and commit only when it passes. See `references/source-grounded-qa-gates.md` for the Human Design/large-media pattern. ① **Orphan pages:** Find pages with no inbound `[[wikilinks]]` from other pages. ```python # Use execute_code for this — programmatic scan across all wiki pages import os, re from collections import defaultdict wiki = "" # Scan all .md files in entities/, concepts/, comparisons/, queries/ # Extract all [[wikilinks]] — build inbound link map # Pages with zero inbound links are orphans ``` ② **Broken wikilinks:** Find `[[links]]` that point to pages that don't exist. ③ **Index completeness:** Every wiki page should appear in `index.md`. Compare the filesystem against index entries. ④ **Frontmatter validation:** Every wiki page must have all required fields (title, created, updated, type, tags, sources). Tags must be in the taxonomy. ⑤ **Stale content:** Pages whose `updated` date is >90 days older than the most recent source that mentions the same entities. ⑥ **Contradictions:** Pages on the same topic with conflicting claims. Look for pages that share tags/entities but state different facts. Surface all pages with `contested: true` or `contradictions:` frontmatter for user review. ⑦ **Quality signals:** List pages with `confidence: low` and any page that cites only a single source but has no confidence field set — these are candidates for either finding corroboration or demoting to `confidence: medium`. ⑧ **Source drift:** For each file in `raw/` with a `sha256:` frontmatter, recompute the hash and flag mismatches. Mismatches indicate the raw file was edited (shouldn't happen — raw/ is immutable) or ingested from a URL that has since changed. Not a hard error, but worth reporting. ⑨ **Page size:** Flag pages over 200 lines — candidates for splitting. ⑩ **Tag audit:** List all tags in use, flag any not in the SCHEMA.md taxonomy. ⑪ **Log rotation:** If log.md exceeds 500 entries, rotate it. ⑫ **Report findings** with specific file paths and suggested actions, grouped by severity (broken links > orphans > source drift > contested pages > stale content > style issues). ⑬ **Append to log.md:** `## [YYYY-MM-DD] lint | N issues found` ## Working with the Wiki ### Searching ```bash # Find pages by content search_files "transformer" path="$WIKI" file_glob="*.md" # Find pages by filename search_files "*.md" target="files" path="$WIKI" # Find pages by tag search_files "tags:.*alignment" path="$WIKI" file_glob="*.md" # Recent activity read_file "$WIKI/log.md" offset= ``` ### Bulk Ingest When ingesting multiple sources at once, batch the updates: 1. Read all sources first 2. Identify all entities and concepts across all sources 3. Check existing pages for all of them (one search pass, not N) 4. Create/update pages in one pass (avoids redundant updates) 5. Update index.md once at the end 6. Write a single log entry covering the batch ### Strategic Corrections / Project Reframes When the user corrects the operating frame of a wiki-backed project — e.g. “this is execution-first, not research-first,” “we already have relationships/buyers,” or “organize around equipment/deployment, not validation” — treat it as a first-class wiki maintenance task, not a chat-only clarification. Create or update a concept page for the corrected thesis, update relevant object/inventory pages, patch the main plan/roadmap and swarm brief, and update index/log/action registers. If a deployed console/cockpit reads or hardcodes summaries from the wiki, patch and redeploy the app/API too so the UI does not continue surfacing stale priorities. See `references/strategic-correction-reframe.md` for the tested pattern and verification checklist. ### Hybrid Raw Asset Storage For wikis that ingest PDFs/books/media, preserve speed by keeping compiled markdown plus small raw markdown/JSON/manifests local, while storing large immutable source PDFs, rendered page images, scans, audio, and video in S3-compatible object storage. Use local `.s3stub` pointer files and `raw/books//manifest.json`; fetch assets into `.cache/s3/` only when exact raw reinspection is needed, and verify with SHA-256 before use. See `references/hybrid-s3-raw-assets.md` for the tested Alex VPS pattern, manifest shape, gitignore rules, and command examples. For MEGA-hosted corpora, especially when raw source files must not persist on the VPS, see `references/mega-cloud-drive-ingest.md`. It captures the `megajs` workaround for public MEGA folder listing/download, golden-sample selection, transient `/tmp` download → S3 upload → local deletion workflow, and verification checks. For large corpora that must keep ingesting autonomously beyond the current chat session, see `references/background-corpus-swarm-ingest.md`. It captures the run-until-complete Hermes cron swarm pattern: dedicated project space, state file, reports, package coverage ledger, orchestrator/digest jobs, and explicit stop conditions. For productized multi-tenant ingestion UIs (e.g. Hermes Spawn), see `references/spawn-kb-ingestion-sessions.md`. Treat every KB ingest as a durable Hermes ingestion session with a visible transcript/status, scope note, final report, resumable artifacts, and interrupt support. Do not run hidden one-shot ingestion that can fail silently behind a raw queued source. When the user wants the swarm actively processing now rather than waiting for a scheduled heartbeat, use `references/foreground-continuation-workers.md`. It captures the pattern of spawning a bounded `hermes chat` worker from the swarm workdir, verifying the live process/PID, leaving cron enabled, and committing/pushing clean wiki changes between bounded units. For background swarms that touch S3, provider SDKs, or other tools under cron, see `references/autonomous-swarm-runtime-bootstrap.md`. It captures the self-contained `.env` + `.venv` + `scripts/bootstrap_runtime.sh` pattern, real S3 put/head/delete verification before ledger unblocking, JSON `.s3stub` cleanup, and prompt language that tells the orchestrator to install missing dependencies before declaring blockers. For Human Design wiki swarm synthesis, do not impose fixed page-count targets. See `references/human-design-wiki-swarm.md` for Alex's corrected dynamic-synthesis rule: load the relevant extracted source context into GPT-5.5 and let source coverage determine whether the run creates one deep page, many atomic pages, query/comparison pages, source-page expansion, or candidate work items, while preserving provenance and QA cleanliness. That reference also captures the compounding synthesis rule: update existing Markdown pages when topics already exist, create new pages only with source support, and add/repair `[[wikilinks]]`. For a concrete Human Design golden-sample ingest, see `references/human-design-lyd-golden-sample.md`. It captures the RA LYD 7h lecture + 94-page slide PDF lessons: timestamp-free ASR is provisional, PowerPoint PDFs need slide-aware vision routing, `.cache/` raw media can still fail QA, and `--s3` failures may come from loading credentials for the wrong bucket. For the completed high-fidelity LYD pass, see `references/human-design-lyd-timestamped-slide-ingest.md`. It captures the timestamped Venice Whisper workflow, 94-slide vision extraction, resumable missing-page reruns after provider 429s, candidate slide/transcript alignment, disk cleanup, and the QA pitfall where combined transcripts reset timestamps at each source-audio heading. For turning a large MEGA/wiki effort into a long-running autonomous project, see `references/human-design-wiki-swarm.md`. It captures the dedicated `/home/avalon/hd-wiki-swarm` project-space pattern, Hermes cron orchestrator + daily digest jobs, state/report layout, role definitions, bounded-run policy, and the chart-analysis-oriented synthesis target for gates, lines, colors, tones, bases, variables, centers, channels, transits, and chart comparison. For source-sensitive bulk ingests (long lectures, PDFs, slide decks, domain archives), add committed QA gates before bulk processing. See `references/source-grounded-qa-gates.md` for checks covering no-local-media policy, manifests/stubs, S3 HEAD verification, transcript timestamps, PDF rendered-page assets, in-body provenance markers, and avoiding ungrounded placeholder pages. For single long strategy recordings or interviews that need raw transcript preservation plus structured wiki pages, use `references/long-audio-to-wiki-ingest.md`. It covers cloud-link acquisition when Telegram cannot cache oversized voice files, chunked faster-whisper transcription, watcher/idempotency pattern, parallel timestamp-slice analysis, wiki page synthesis, and disk-space pitfalls on Alex's VPS. For long Telegram/voice recordings that should become a new wiki, use the durable incoming-folder + watcher + chunked `faster-whisper` pattern in `references/long-audio-telegram-wiki-ingest.md`. Check the gateway log for `Failed to cache voice: File is too big` before assuming the audio exists locally, and use alternate upload routes when Telegram bot download rejects the file. ### Background Swarm Ingest for Large Corpora When the corpus is too large for a single session and the user wants autonomous progress until exhaustion, run the wiki through a dedicated background swarm/project space rather than ad-hoc chat turns. The pattern is: a swarm directory beside the wiki, a persistent state file, recursive inventory, package coverage ledger, reports per run, a local/noiseless Hermes cron orchestrator, and a separate daily digest job. The orchestrator should process bounded resumable units, but its global stop condition must be explicit: all ingestible packages/files are `complete` or intentionally `skip_*`/`blocked` with reasons. Do not stop at golden samples. See `references/background-corpus-swarm-ingest.md` for the tested run-until-complete pattern and prompt requirements. ### Farmer-Friendly Ingest Convention When a scheduled farmer writes into the wiki, add enough provenance that later sessions can tell human and automated updates apart. Good patterns: - raw filenames that encode source + date, e.g. `raw/articles/youtube-nate-herck-2026-05-01.md` - frontmatter fields such as `source: farmer/youtube`, `farmed: 2026-05-01T06:00:00Z` - a matching log entry describing which farmer/job ran and which files changed This keeps the wiki auditable as it compounds over time. ### Chat-Product Memory Harvester Pattern When adapting an LLM wiki into a chat product, do not let the agent silently write every interesting chat turn into durable memory. Put a memory-harvest pass after the assistant answer that considers both sides of the exchange: what the user shared and what the assistant synthesized. The pass should draft reviewable save proposals rather than directly mutating the wiki. Good proposal fields: `kind`, `title`, `summary`, `proposedMarkdown`, `targetPath`, `crossLinks`, `confidence`, `provenance` (`userMessageId`, `assistantMessageId`, timestamp), and `status` (`pending`, `approved`, `rejected`, `superseded`). Candidate kinds include durable person facts, life events, dream notes, reading conclusions, chart/HD corrections, transit windows, substantial query syntheses, and user corrections. Before approving writes, orient to `SCHEMA.md`, `index.md`, recent `log.md`, and relevant pages; update existing pages before creating new ones; cross-link updates; and record contradictions instead of overwriting. In multi-tenant products, approval must go through constrained KB tools/path locking, not general filesystem access. Public UI should show simple human labels like “Possible memory update noticed,” not raw internal file paths, source chunks, skill ids, or hidden routing metadata. ### Archiving When content is fully superseded or the domain scope changes: 1. Create `_archive/` directory if it doesn't exist 2. Move the page to `_archive/` with its original path (e.g., `_archive/entities/old-page.md`) 3. Remove from `index.md` 4. Update any pages that linked to it — replace wikilink with plain text + "(archived)" 5. Log the archive action ### Obsidian Integration The wiki directory works as an Obsidian vault out of the box: - `[[wikilinks]]` render as clickable links - Graph View visualizes the knowledge network - YAML frontmatter powers Dataview queries - The `raw/assets/` folder holds images referenced via `![[image.png]]` ### Quartz Web Viewer Layer For a hosted/web-native viewer, publish the Markdown vault with Quartz instead of running the Obsidian desktop app through VNC/noVNC. Quartz provides a fast static site with backlinks, search, and graph-style navigation while keeping the vault files as source of truth. Treat Quartz as the **viewer/publishing layer**, not the whole Karpathy LLM wiki pattern: the full pattern still requires ingestion/context-farmer jobs that create raw source records, synthesize entity/concept/comparison pages, and update `index.md`/`log.md`. See `references/quartz-viewer-and-context-farmers.md` for the VPS multi-vault Quartz deployment pattern and the viewer-vs-farmer distinction learned from Alex's Obsidian migration. For best results: - Set Obsidian's attachment folder to `raw/assets/` - Enable "Wikilinks" in Obsidian settings (usually on by default) - Install Dataview plugin for queries like `TABLE tags FROM "entities" WHERE contains(tags, "company")` If using the Obsidian skill alongside this one, set `OBSIDIAN_VAULT_PATH` to the same directory as the wiki path. ### Obsidian Headless (servers and headless machines) On machines without a display, use `obsidian-headless` instead of the desktop app. It syncs vaults via Obsidian Sync without a GUI — perfect for agents running on servers that write to the wiki while Obsidian desktop reads it on another device. **Setup:** ```bash # Requires Node.js 22+ npm install -g obsidian-headless # Login (requires Obsidian account with Sync subscription) ob login --email --password '' # Create a remote vault for the wiki ob sync-create-remote --name "LLM Wiki" # Connect the wiki directory to the vault cd ~/wiki ob sync-setup --vault "" # Initial sync ob sync # Continuous sync (foreground — use systemd for background) ob sync --continuous ``` **Continuous background sync via systemd:** ```ini # ~/.config/systemd/user/obsidian-wiki-sync.service [Unit] Description=Obsidian LLM Wiki Sync After=network-online.target Wants=network-online.target [Service] ExecStart=/path/to/ob sync --continuous WorkingDirectory=/home/user/wiki Restart=on-failure RestartSec=10 [Install] WantedBy=default.target ``` ```bash systemctl --user daemon-reload systemctl --user enable --now obsidian-wiki-sync # Enable linger so sync survives logout: sudo loginctl enable-linger $USER ``` This lets the agent write to `~/wiki` on a server while you browse the same vault in Obsidian on your laptop/phone — changes appear within seconds. ## Pitfalls - **Never modify files in `raw/`** — sources are immutable. Corrections go in wiki pages. - **Always orient first** — read SCHEMA + index + recent log before any operation in a new session. Skipping this causes duplicates and missed cross-references. - **Always update index.md and log.md** — skipping this makes the wiki degrade. These are the navigational backbone. - **Don't create pages for passing mentions** — follow the Page Thresholds in SCHEMA.md. A name appearing once in a footnote doesn't warrant an entity page. - **Don't create pages without cross-references** — isolated pages are invisible. Every page must link to at least 2 other pages. - **Frontmatter is required** — it enables search, filtering, and staleness detection. - **Tags must come from the taxonomy** — freeform tags decay into noise. Add new tags to SCHEMA.md first, then use them. - **Keep pages scannable** — a wiki page should be readable in 30 seconds. Split pages over 200 lines. Move detailed analysis to dedicated deep-dive pages. - **Ask before mass-updating** — if an ingest would touch 10+ existing pages, confirm the scope with the user first. - **Rotate the log** — when log.md exceeds 500 entries, rename it `log-YYYY.md` and start fresh. The agent should check log size during lint. - **Handle contradictions explicitly** — don't silently overwrite. Note both claims with dates, mark in frontmatter, flag for user review. - **Do not create placeholder concept pages just to satisfy wikilinks** — if a concept such as `authority`, `type`, or `center` has not yet been source-grounded, leave it as plain text or a working note until an ingest actually supports it. Broken wikilinks should fail QA rather than encouraging ungrounded pages. - **For high-value media/PDF ingests, prefer fidelity over speed/cost** — use timestamped transcription, rendered PDF page images, vision extraction for artifacted PDFs, and phase-level QA gates. Structural lint is necessary but not sufficient; include manual/LLM-assisted spot checks against the original audio/page images. - **When the user corrects the project frame, propagate it through the wiki and any cockpit.** Do not leave a strategic correction buried in chat or in one plan page. Create a durable concept for the corrected thesis, update plans/swarm briefs/action registers/index/log, and if a live console has hardcoded recommendations or gates, patch/redeploy it. See `references/strategic-correction-reframe.md`. - **Do not let validation artifacts become the ontology by accident.** Especially in business/execution wikis, compliance, buyer validation, and research gates are often guardrails. If the user says relationships, buyers, or operational access already exist, encode that as the starting stance while still tracking practical confirmations. ## Related Tools [llm-wiki-compiler](https://github.com/atomicmemory/llm-wiki-compiler) is a Node.js CLI that compiles sources into a concept wiki with the same Karpathy inspiration. It's Obsidian-compatible, so users who want a scheduled/CLI-driven compile pipeline can point it at the same vault this skill maintains. Trade-offs: it owns page generation (replaces the agent's judgment on page creation) and is tuned for small corpora. Use this skill when you want agent-in-the-loop curation; use llmwiki when you want batch compile of a source directory.