Random fixes waste time and create new bugs. Quick patches mask underlying issues.
Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
Violating the letter of this process is violating the spirit of debugging.
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
If you haven't completed Phase 1, you cannot propose fixes.
Use for ANY technical issue: - Test failures - Bugs in production - Unexpected behavior - Performance problems - Build failures - Integration issues
Use this ESPECIALLY when: - Under time pressure (emergencies make guessing tempting) - "Just one quick fix" seems obvious - You've already tried multiple fixes - Previous fix didn't work - You don't fully understand the issue
Don't skip when: - Issue seems simple (simple bugs have root causes too) - You're in a hurry (rushing guarantees rework) - Someone wants it fixed NOW (systematic is faster than thrashing)
You MUST complete each phase before proceeding to the next.
BEFORE attempting ANY fix:
When the user is visibly frustrated with repeated UI/provider fixes, treat that as a signal to slow down and inspect the architecture first: map persisted state vs in-memory/UI state, provider payloads, queue status, and refresh/restart behavior before touching CSS or prompts. Alex especially expects root-cause evidence for screenshot-driven visual bugs, AI editing/reference bugs, and long-running job feedback.
Action: Use read_file on the relevant source files. Use search_files to find the error string in the codebase.
Action: Use the terminal tool to run the failing test or trigger the bug:
# Run specific failing test
pytest tests/test_module.py::test_name -v
# Run with verbose output
pytest tests/test_module.py -v --tb=long
Action:
# Recent commits
git log --oneline -10
# Uncommitted changes
git diff
# Changes in specific file
git log -p --follow src/problematic_file.py | head -100
WHEN system has multiple components (API → service → database, CI → build → deploy):
BEFORE proposing fixes, add diagnostic instrumentation:
For EACH component boundary: - Log what data enters the component - Log what data exits the component - Verify environment/config propagation - Check state at each layer
Run once to gather evidence showing WHERE it breaks. THEN analyze evidence to identify the failing component. THEN investigate that specific component.
WHEN error is deep in the call stack:
Action: Use search_files to trace references:
# Find where the function is called
search_files("function_name(", path="src/", file_glob="*.py")
# Find where the variable is set
search_files("variable_name\\s*=", path="src/", file_glob="*.py")
STOP: Do not proceed to Phase 2 until you understand WHY it's happening.
Find the pattern before fixing:
Action: Use search_files to find comparable patterns:
search_files("similar_pattern", path="src/", file_glob="*.py")
Scientific method:
Fix the root cause, not the symptom:
test-driven-development skill# Run the specific regression test
pytest tests/test_module.py::test_regression -v
# Run full suite — no regressions
pytest tests/ -q
Pattern indicating an architectural problem: - Each fix reveals new shared state/coupling in a different place - Fixes require "massive refactoring" to implement - Each fix creates new symptoms elsewhere
STOP and question fundamentals: - Is this pattern fundamentally sound? - Are we "sticking with it through sheer inertia"? - Should we refactor the architecture vs. continue fixing symptoms?
Discuss with the user before attempting more fixes.
This is NOT a failed hypothesis — this is a wrong architecture.
If you catch yourself thinking: - "Quick fix for now, investigate later" - "Just try changing X and see if it works" - "Add multiple changes, run tests" - "Skip the test, I'll manually verify" - "It's probably X, let me fix that" - "I don't fully understand but this might work" - "Pattern says X but I'll adapt it differently" - "Here are the main problems: [lists fixes without investigation]" - Proposing solutions before tracing data flow - "One more fix attempt" (when already tried 2+) - Each fix reveals a new problem in a different place
ALL of these mean: STOP. Return to Phase 1.
If 3+ fixes failed: Question the architecture (Phase 4 step 5).
| Excuse | Reality |
|---|---|
| "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
| "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
| "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
| "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. |
| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
| "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. |
| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question the pattern, don't fix again. |
| Phase | Key Activities | Success Criteria |
|---|---|---|
| 1. Root Cause | Read errors, reproduce, check changes, gather evidence, trace data flow | Understand WHAT and WHY |
| 2. Pattern | Find working examples, compare, identify differences | Know what's different |
| 3. Hypothesis | Form theory, test minimally, one variable at a time | Confirmed or new hypothesis |
| 4. Implementation | Create regression test, fix root cause, verify | Bug resolved, all tests pass |
Use these Hermes tools during Phase 1:
search_files — Find error strings, trace function calls, locate patternsread_file — Read source code with line numbers for precise analysisterminal — Run tests, check git history, reproduce bugsweb_search/web_extract — Research error messages, library docsWhen the user wants documentation checked, prefer this order:
1. web_extract for direct doc/page extraction
2. web_search to discover the right docs page
3. Browser tools only as fallback when extraction/search fails, content is JS-rendered, or interactive discovery is required
Why: users may prefer Firecrawl/web extraction over browser snapshots for straight documentation reading, and browser browsing is slower/noisier unless necessary.
For third-party API integration bugs, add this Phase 1 checklist before proposing fixes:
image_url vs start_image_url, etc.)Common lesson: docs drift faster than app registries. Before assuming a provider requires special access, verify whether the endpoint path, payload shape, or local access-check logic is stale or misleading.
For complex multi-component debugging, dispatch investigation subagents:
delegate_task(
goal="Investigate why [specific test/behavior] fails",
context="""
Follow systematic-debugging skill:
1. Read the error message carefully
2. Reproduce the issue
3. Trace the data flow to find root cause
4. Report findings — do NOT fix yet
Error: [paste full error]
File: [path to failing code]
Test command: [exact command]
""",
toolsets=['terminal', 'file']
)
When fixing bugs: 1. Write a test that reproduces the bug (RED) 2. Debug systematically to find root cause 3. Fix the root cause (GREEN) 4. The test proves the fix and prevents regression
From debugging sessions: - Systematic approach: 15-30 minutes to fix - Random fixes approach: 2-3 hours of thrashing - First-time fix rate: 95% vs 40% - New bugs introduced: Near zero vs common
No shortcuts. No guessing. Systematic always wins.