--- name: subagent-driven-development description: "Execute plans via delegate_task subagents (2-stage review)." version: 1.1.0 author: Hermes Agent (adapted from obra/superpowers) license: MIT metadata: hermes: tags: [delegation, subagent, implementation, workflow, parallel] related_skills: [writing-plans, requesting-code-review, test-driven-development] --- # Subagent-Driven Development ## Overview Execute implementation plans by dispatching fresh subagents per task with systematic two-stage review. **Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration. ## When to Use Use this skill when: - You have an implementation plan (from writing-plans skill or user requirements) - Tasks are mostly independent - Quality and spec compliance are important - User asks to “spin up a subagent” for a focused design/code pass while the controller continues platform inspection or deployment work **vs. manual execution:** - Fresh context per task (no confusion from accumulated state) - Automated review process catches issues early - Consistent quality checks across all tasks - Subagents can ask questions before starting work ## The Process ### 1. Read and Parse Plan Read the plan file. Extract ALL tasks with their full text and context upfront. Create a todo list: ```python # Read the plan read_file("docs/plans/feature-plan.md") # Create todo list with all tasks todo([ {"id": "task-1", "content": "Create User model with email field", "status": "pending"}, {"id": "task-2", "content": "Add password hashing utility", "status": "pending"}, {"id": "task-3", "content": "Create login endpoint", "status": "pending"}, ]) ``` **Key:** Read the plan ONCE. Extract everything. Don't make subagents read the plan file — provide the full task text directly in context. ### 2. Per-Task Workflow For EACH task in the plan: #### Step 1: Dispatch Implementer Subagent Use `delegate_task` with complete context: ```python delegate_task( goal="Implement Task 1: Create User model with email and password_hash fields", context=""" TASK FROM PLAN: - Create: src/models/user.py - Add User class with email (str) and password_hash (str) fields - Use bcrypt for password hashing - Include __repr__ for debugging FOLLOW TDD: 1. Write failing test in tests/models/test_user.py 2. Run: pytest tests/models/test_user.py -v (verify FAIL) 3. Write minimal implementation 4. Run: pytest tests/models/test_user.py -v (verify PASS) 5. Run: pytest tests/ -q (verify no regressions) 6. Commit: git add -A && git commit -m "feat: add User model with password hashing" PROJECT CONTEXT: - Python 3.11, Flask app in src/app.py - Existing models in src/models/ - Tests use pytest, run from project root - bcrypt already in requirements.txt """, toolsets=['terminal', 'file'] ) ``` #### Step 2: Dispatch Spec Compliance Reviewer After the implementer completes, verify against the original spec: ```python delegate_task( goal="Review if implementation matches the spec from the plan", context=""" ORIGINAL TASK SPEC: - Create src/models/user.py with User class - Fields: email (str), password_hash (str) - Use bcrypt for password hashing - Include __repr__ CHECK: - [ ] All requirements from spec implemented? - [ ] File paths match spec? - [ ] Function signatures match spec? - [ ] Behavior matches expected? - [ ] Nothing extra added (no scope creep)? OUTPUT: PASS or list of specific spec gaps to fix. """, toolsets=['file'] ) ``` **If spec issues found:** Fix gaps, then re-run spec review. Continue only when spec-compliant. #### Step 3: Dispatch Code Quality Reviewer After spec compliance passes: ```python delegate_task( goal="Review code quality for Task 1 implementation", context=""" FILES TO REVIEW: - src/models/user.py - tests/models/test_user.py CHECK: - [ ] Follows project conventions and style? - [ ] Proper error handling? - [ ] Clear variable/function names? - [ ] Adequate test coverage? - [ ] No obvious bugs or missed edge cases? - [ ] No security issues? OUTPUT FORMAT: - Critical Issues: [must fix before proceeding] - Important Issues: [should fix] - Minor Issues: [optional] - Verdict: APPROVED or REQUEST_CHANGES """, toolsets=['file'] ) ``` **If quality issues found:** Fix issues, re-review. Continue only when approved. #### Step 4: Mark Complete ```python todo([{"id": "task-1", "content": "Create User model with email field", "status": "completed"}], merge=True) ``` ### 3. Final Review After ALL tasks are complete, dispatch a final integration reviewer: ```python delegate_task( goal="Review the entire implementation for consistency and integration issues", context=""" All tasks from the plan are complete. Review the full implementation: - Do all components work together? - Any inconsistencies between tasks? - All tests passing? - Ready for merge? """, toolsets=['terminal', 'file'] ) ``` ### 4. Verify and Commit ```bash # Run full test suite pytest tests/ -q # Review all changes git diff --stat # Final commit if needed git add -A && git commit -m "feat: complete [feature name] implementation" ``` ## Task Granularity **Each task = 2-5 minutes of focused work.** **Too big:** - "Implement user authentication system" **Right size:** - "Create User model with email and password fields" - "Add password hashing function" - "Create login endpoint" - "Add JWT token generation" - "Create registration endpoint" ## Red Flags — Never Do These - Start implementation without a plan - Skip reviews (spec compliance OR code quality) - Proceed with unfixed critical/important issues - Dispatch multiple implementation subagents for tasks that touch the same files - Make subagent read the plan file (provide full text in context instead) - Skip scene-setting context (subagent needs to understand where the task fits) - Ignore subagent questions (answer before letting them proceed) - Accept "close enough" on spec compliance - Skip review loops (reviewer found issues → implementer fixes → review again) - Let implementer self-review replace actual review (both are needed) - **Start code quality review before spec compliance is PASS** (wrong order) - Move to next task while either review has open issues ## Pre-flight discovery before delegating **Before** writing the subagent goal/context, spend 30–60s in the controller doing a fast grep/read to learn what already exists for the task. Subagents that re-explore well-trodden code burn 20+ tool calls and frequently time out at 600s. Concretely, for any "add feature X to repo Y" task: 1. `search_files` or `grep` for the feature's likely keywords (e.g. `anthropic`, `Anthropic`, `OAuth`, `provider`) across the target repo. 2. If matches show pre-existing helpers, endpoints, or matrix entries, your task changes shape: from "build X" to "enable / surface / smoke-test X." 3. State this discovery explicitly in the subagent's `context` field with a **KEY FACT** preamble: > "KEY FACT: Anthropic plumbing is ALREADY IMPLEMENTED in this repo. Do NOT redo it. Just enable and smoke-test it. - `installAnthropicOauthCredential` exists at server.mjs:250 - `anthropic` and `anthropic-oauth` entries already in provider-matrix.mjs but likely filtered out by an `enabled: false` flag." 4. Give precise line numbers, function names, and the suspected reason it isn't yet live. The subagent then dives straight to the right files instead of re-discovering them. **Failure mode to avoid:** vague "add Anthropic as a provider" without pre-flight check → subagent reads the whole codebase trying to figure out what's there → 600s timeout. Same task with KEY FACT preamble: completes in ~3 minutes. Pre-flight applies even to parallel batches: a 60s controller-side grep across both target repos is cheaper than 20 minutes of parallel subagent thrash. ### Common pre-flight discoveries that reshape the task When grep hits show prior work, the task usually mutates from "build X" to one of these: - **Feature-flagged off.** Code exists, helpers are wired, but a single `enabled: false` (or env-var gate, or filtered list) hides it from the public surface. The job is to flip the flag and smoke-test, not to re-author. Astral's `provider-matrix.mjs` had full `anthropic` + `anthropic-oauth` entries with `enabled: false` — flipping one boolean per entry surfaced them in `/api/providers`. Always grep for `enabled`, `disabled`, `deferred`, `comingSoon`, `beta`, etc. near the feature's data. - **Server has it, client doesn't (or vice versa).** A `providers` const in React/Vue, a label map, an enum, a switch statement — these are config-in-two-places traps. Find the server-side source of truth, then `grep` the client for the same provider/feature keys; if they don't match, the public UI silently omits the option even though the API supports it. Example: Astral `/api/providers` returned `anthropic` + `anthropic-oauth` after the matrix flip, but `web/src/main.jsx` had a hardcoded `providers` object listing only `openai-codex` and `openai`. User saw no Claude option on the wizard. - **Helpers exist but no route calls them.** Functions like `installAnthropicOauthCredential`, `exchangeAnthropicAuthorization`, `buildAnthropicAuthorizeUrl` were all present in `server.mjs` but no `app.post('/api/provider/anthropic/...'`)` handler invoked them. Search for the helper name in route definitions; if zero matches, the feature is half-built and your job is the wiring, not the implementation. - **Code shipped to one repo, missing from a sibling.** Pre-flight both repos before parallel delegation. The Codex device-auth flow was complete in one and partial in the other. Document the discovery as a **KEY FACT** preamble in the subagent context so it spends its budget on the right slice. ## Iteration-budget watchdog Subagents can hit `max_iterations` (default ~50 calls) before they finish, even when the code is fully written. When that happens: - The summary will say `"exit_reason": "max_iterations"` and explicitly call out work it did NOT complete (e.g. "Did not commit", "Did not deploy", "Did not run live smoke"). - The controller MUST finish that work directly — do not re-delegate the same task and pay the discovery cost again. The subagent already did the hard part; the controller just does the final `git commit && git push && pm2 restart && curl smoke` lap. - Treat the subagent's recommended next steps as a checklist for the controller, not as instructions to spawn another agent. ## Timeout ≠ no work done — always verify state before re-dispatching If a subagent exits with `status: "timeout"` (e.g. after 600s with N completed API calls), DO NOT assume nothing landed. The timeout commonly fires on the **final return-summary step** after all the substantive work already completed on disk. Before re-dispatching or telling the user the task failed, **verify external state directly**: - File/code work: `git status -s`, `git log --oneline -5`, and target-file inventory greps (e.g. `grep -l '^## Headline' concepts/*.md | wc -l`). - External API work (HTTP POST, S3 upload, DB write): query the target system directly for the artifact (URL, ID, row). - Long-running batch passes (N items): count completed items vs. expected — partial completion is a useful state, not a failure. Documented case (decan-synthesis pass, 2026-05-22): a subagent timed out at 600s with 26 API calls — but had already edited all 36 files AND committed locally. Re-dispatching would have wasted ~10 minutes and risked a double-commit. Recovery was a 30-second `git log` check followed by a targeted second pass on just the contaminated subset. This is a corollary of the general "subagent self-reports are claims, not facts" rule — verify externally. The novel twist with timeouts is that **the absence of a self-report doesn't mean absence of side effects**. ## Handling Issues ### If Subagent Asks Questions - Answer clearly and completely - Provide additional context if needed - Don't rush them into implementation ### If Reviewer Finds Issues - Implementer subagent (or a new one) fixes them - Reviewer reviews again - Repeat until approved - Don't skip the re-review ### If Subagent Fails a Task - Dispatch a new fix subagent with specific instructions about what went wrong - Don't try to fix manually in the controller session (context pollution) ## Efficiency Notes **Why fresh subagent per task:** - Prevents context pollution from accumulated state - Each subagent gets clean, focused context - No confusion from prior tasks' code or reasoning **Why two-stage review:** - Spec review catches under/over-building early - Quality review ensures the implementation is well-built - Catches issues before they compound across tasks **Cost trade-off:** - More subagent invocations (implementer + 2 reviewers per task) - But catches issues early (cheaper than debugging compounded problems later) ## Focused UI/UX Subagent Pattern When Alex explicitly asks to “spin up a subagent” for a web UI/design cleanup, use a focused implementer subagent rather than doing the visual pass in the controller session. Give it: - exact project path and stack - live URL and deployment context - visual direction (e.g. “Typeform-inspired, minimal light-gray, mobile-first, step-by-step”) - product copy requirements and user-instruction requirements - constraints about preserving existing API behavior and avoiding secrets exposure - expected local verification (`npm run build`, touched files, commit message if appropriate) After the subagent returns, the controller must still: 1. Inspect the resulting git diff/log. 2. Run the build/tests directly. 3. Restart/deploy PM2/nginx as needed. 4. Verify public health/root URL. 5. Commit/push if the subagent did not already do so, or push its commit if it did. 6. Summarize both the subagent result and controller verification. Do not treat the subagent’s success as deployment verification; it is an implementation pass, not the final release gate. ## Integration with Other Skills ### With writing-plans This skill EXECUTES plans created by the writing-plans skill: 1. User requirements → writing-plans → implementation plan 2. Implementation plan → subagent-driven-development → working code ### With test-driven-development Implementer subagents should follow TDD: 1. Write failing test first 2. Implement minimal code 3. Verify test passes 4. Commit Include TDD instructions in every implementer context. ### With requesting-code-review The two-stage review process IS the code review. For final integration review, use the requesting-code-review skill's review dimensions. ### With systematic-debugging If a subagent encounters bugs during implementation: 1. Follow systematic-debugging process 2. Find root cause before fixing 3. Write regression test 4. Resume implementation ## Example Workflow ``` [Read plan: docs/plans/auth-feature.md] [Create todo list with 5 tasks] --- Task 1: Create User model --- [Dispatch implementer subagent] Implementer: "Should email be unique?" You: "Yes, email must be unique" Implementer: Implemented, 3/3 tests passing, committed. [Dispatch spec reviewer] Spec reviewer: ✅ PASS — all requirements met [Dispatch quality reviewer] Quality reviewer: ✅ APPROVED — clean code, good tests [Mark Task 1 complete] --- Task 2: Password hashing --- [Dispatch implementer subagent] Implementer: No questions, implemented, 5/5 tests passing. [Dispatch spec reviewer] Spec reviewer: ❌ Missing: password strength validation (spec says "min 8 chars") [Implementer fixes] Implementer: Added validation, 7/7 tests passing. [Dispatch spec reviewer again] Spec reviewer: ✅ PASS [Dispatch quality reviewer] Quality reviewer: Important: Magic number 8, extract to constant Implementer: Extracted MIN_PASSWORD_LENGTH constant Quality reviewer: ✅ APPROVED [Mark Task 2 complete] ... (continue for all tasks) [After all tasks: dispatch final integration reviewer] [Run full test suite: all passing] [Done!] ``` ## Remember ``` Fresh subagent per task Two-stage review every time Spec compliance FIRST Code quality SECOND Never skip reviews Catch issues early ``` **Quality is not an accident. It's the result of systematic process.** ## Further reading (load when relevant) When the orchestration involves significant context usage, long review loops, or complex validation checkpoints, load these references for the specific discipline: - **`references/context-budget-discipline.md`** — Four-tier context degradation model (PEAK / GOOD / DEGRADING / POOR), read-depth rules that scale with context window size, and early warning signs of silent degradation. Load when a run will clearly consume significant context (multi-phase plans, many subagents, large artifacts). - **`references/gates-taxonomy.md`** — The four canonical gate types (Pre-flight, Revision, Escalation, Abort) with behavior, recovery, and examples. Load when designing or reviewing any workflow that has validation checkpoints — use the vocabulary explicitly so each gate has defined entry, failure behavior, and resumption rules. Both references adapted from gsd-build/get-shit-done (MIT © 2025 Lex Christopherson).