Bug Fix v4.0 — OpenClaw Edition (Zero-Regression + Portable)
Core Promise: Fix completely. Fix everywhere. Break nothing. Learn from every fix.
Iron Rules (12 — NEVER Violate)
┌──────────────────────────────────────────────────────────────────────────┐
│ Rule 1: Root cause MUST pass 4 gates before fixing │
│ (reproducible + causal + reversible + mechanistic) │
│ │
│ Rule 2: Scope MUST pass 5 gates before fixing │
│ (consumers + contracts + invariants + call sites + dup scan) │
│ │
│ Rule 3: MUST trace IMPACT CHAIN (code → data → time → event) │
│ + scan ALL files for same pattern before writing fix │
│ │
│ Rule 4: MUST predict side effects + check blind spots before coding │
│ (references/blind-spots.md is single source of truth) │
│ │
│ Rule 5: After fix, MUST run regression verification │
│ (functional + performance + concurrency + all impact levels) │
│ │
│ Rule 6: MUST verify fix is LOADED at runtime │
│ (clear __pycache__ + restart + exercise code path) │
│ │
│ Rule 7: Framework behavior → read source code first, never trust │
│ docs/comments/assumptions alone │
│ │
│ Rule 8: UI bugs MUST gather RUNTIME EVIDENCE before proposing fixes │
│ (screenshot + DevTools DOM/console + user repro steps) │
│ Do NOT fix UI bugs based on code reading alone. │
│ │
│ Rule 9: Fix is NOT done until: Bug Summary output + code-review │
│ passes + knowledge files updated + self-reflection complete │
│ │
│ Rule 10: Before fixing, CLASSIFY the problem layer: │
│ code bug? missing config? wrong architecture? AI capability? │
│ Fix at the root layer, not at the symptom layer. │
│ │
│ Rule 11: Pattern matching (regex, string match, name lookup) MUST │
│ check boundary conditions (word boundaries / anchors / exact) │
│ │
│ Rule 12: Before fix, MUST search bug pattern library + bug records │
│ for known fixes and historical context │
└──────────────────────────────────────────────────────────────────────────┘
Workflow Overview
Phase 0: Triage → Severity (P0-P3) + Tier (Trivial/Standard/Complex)
│
├─ Trivial → Quick Fix → test → done
│
├─ Standard ─┐
└─ Complex ──┘
│
Phase 1: Reproduce (evidence required)
│
Phase 2: Root Cause Analysis
2A: Hypothesis ladder → 5 Whys → evidence
2B: Search knowledge files (bug-patterns + bug-records)
2C: Impact chain (code + data + time + event)
2D: Similar issue scan across codebase
│
Phase 3: Scope + Prediction
3A: Consumer list → contracts → invariants → dup scan (5 gates)
3B: Side effect prediction + blind spot check
3C: Fix strategy comparison (when >10 LOC, Complex only)
│
Phase 4: Fix (minimal change, prefer ≤50 LOC)
│
Phase 5: Verify + Review
5A: Regression verification (functional + perf + concurrency)
5B: Runtime deployment verification
5C: Bug Summary + code-review skill
│
Phase 6: Knowledge Deposit + Self-Reflection
Phase 0: Triage + Severity
Classify severity AND tier FIRST to control workflow depth.
Severity Classification (controls workflow depth)
| Severity |
Criteria |
Workflow |
Time-box |
| P0 Critical |
Production down / data loss / security |
FULL (all phases) |
4h escalation |
| P1 High |
Core feature broken / data corruption |
FULL (all phases) |
8h escalation |
| P2 Medium |
Non-core feature / UI issue |
STANDARD (skip 3C) |
16h |
| P3 Low |
Cosmetic / minor edge case |
QUICK (skip 2C, 2D, 3A-3C) |
No limit |
Tier Classification (controls fix path)
| Tier |
Criteria |
Path |
| Trivial |
Typo, config value, 1-line obvious fix, no behavioral change |
Quick Fix (below) |
| Standard |
Logic bug, 1-3 files, clear symptom, no cross-module risk |
Standard Path (skip phases marked "Complex only") |
| Complex |
Cross-module, >3 files, shared utility, schema change, multi-process |
Full Path (all phases mandatory) |
Quick Fix Path (Trivial only)
## Quick Fix
- Bug: [one-line description]
- Fix: [one-line change]
- File: [path:line]
- Test: [how verified — lint/test/manual]
- Risk: None (isolated, no behavioral change)
After quick fix: update references/bug-records.md, done. No RCA, no impact chain, no self-reflection needed.
If "trivial" fix touches >1 file or changes behavior → upgrade to Standard.
Auto-Initialize Knowledge Files
Check: references/bug-patterns.md exists?
YES → search it in Phase 2B
NO → skip pattern search; create after first fix
Check: references/bug-records.md exists?
YES → search it in Phase 2B
NO → skip records search; create after first fix
Check: references/blind-spots.md exists?
YES → use it in Phase 3B
NO → skip blind spot check; create after first fix
Phase 1: Reproduce
MUST have evidence before continuing. No evidence = no fix.
| Bug Type |
Evidence Required |
| Backend error |
Stack trace + request/response |
| Frontend UI |
Screenshot + browser console + user repro steps (Rule 8) |
| Performance |
Before/after metrics + profiler output |
| Intermittent |
Timing conditions + frequency estimate |
UI Bug Protocol (Rule 8):
1. Get user screenshot or screen recording
2. Open browser DevTools → check Console for errors/warnings
3. Inspect DOM structure (check for overflow clipping, z-index, Portal needs)
4. Reproduce the exact user steps
5. ONLY THEN form hypotheses
Evidence Bundle Template
### Trigger Conditions
- Input/params: [...]
- Environment: [OS/browser/runtime version]
- Timing: [action sequence or time interval]
### Observable Output
- Error message: [full error text]
- Logs: [key log lines]
- Screenshot/recording: [if available]
### Correlation IDs
- requestId/traceId: [...]
- sessionId: [...]
Phase 2: Root Cause Analysis
2A: Hypothesis Ladder
| # |
Hypothesis |
Likelihood |
Confirmation Test |
Rejection Test |
Status |
| 1 |
[description] |
High/Med/Low |
[prove it IS this] |
[prove it is NOT this] |
[ ] |
Rules: Sort by likelihood → each must be falsifiable → run rejection tests first → test ONE at a time → use 5 Whys to reach root cause.
Root Cause Confirmation Gate (Rule 1)
Root cause is confirmed only when ALL 4 conditions are met:
| Gate |
Meaning |
| Reproducible |
Can trigger symptom in controlled scenario |
| Causal |
Minimal change makes bug disappear |
| Reversible |
Reverting the change makes bug reappear |
| Mechanistic |
Can point to exact code path / state transition |
Framework Assumption Audit (Rule 7)
When fix involves framework/library behavior: list assumptions → read source code to verify → document in comments with source references.
2B: Search Knowledge Files (Rule 12)
Search bug-patterns.md and bug-records.md for matching patterns.
Skip if files don't exist (see Phase 0 auto-init).
| Match Level |
Action |
| High (symptom + root cause match) |
Apply known fix, can skip remaining RCA |
| Medium (similar symptom) |
Reference strategy, verify |
| No match |
Full investigation, must deposit after fix |
2C: Impact Chain (Rule 3)
| Dimension |
What to Check |
| Code |
Bug file → direct callers → indirect callers → deep callers |
| Data |
Corrupted records in DB/file/cache? Repair script needed? |
| Time |
When introduced? Duration of exposure? Users affected? |
| Event |
Message queues, WebSocket, background workers affected? |
2D: Similar Issue Scan (Rule 3)
Scan ALL files for the same bug pattern, not just the reported file.
rg -n "function_name|similar_pattern" --glob "*.{ts,tsx,py,js}"
Phase 3: Scope + Prediction
Scope Accuracy Gate (Rule 2)
| # |
Gate |
Meaning |
| 1 |
Consumer List |
All consumers (callers/dependents) enumerated |
| 2 |
Contract List |
Modified contracts/interfaces/behaviors listed |
| 3 |
Invariant Check |
Must-hold invariants listed |
| 4 |
Call Site Enum |
All call sites enumerated and classified |
| 5 |
Duplicate Scan |
No parallel implementation left unfixed |
3A: Side Effect Prediction (Rule 4)
- Change Blueprint — What exactly will change
- Impact Ripple — L0 (code) → L1 (module) → L2 (feature) → L3 (system) → L4 (user)
- Blind Spot Check — Read
references/blind-spots.md and execute every active check
- Go/No-Go Decision
Quick version (for Standard-tier, ≤5 LOC, 1 file):
## Quick Impact Check
- Change: [one-line description]
- Direct callers: [list or "none - local function"]
- Duplicates: [checked — none / found and planned]
- Could break: [prediction or "low risk - isolated"]
- Decision: GO
3B: Fix Strategy Comparison (>10 LOC, Complex only)
| Dimension |
Strategy A |
Strategy B |
| LOC change |
|
|
| Impact scope |
|
|
| Regression risk |
|
|
| Rollback-able |
|
|
Phase 4: Fix
- Minimal change, prefer ≤50 LOC; justify if more
- ONE change at a time, never batch unrelated fixes
- Layer Rule (Rule 10): Before writing fix code, verify you're fixing the right layer:
| Problem in… |
Fix… |
Do NOT fix… |
| Params/config |
Config or param passing |
Business logic |
| Single component |
That component |
Framework |
| Multiple components same issue |
Framework/base class |
Each component one by one |
| Docs vs code mismatch |
Both sides in sync |
Only one side |
- Pattern matching safety (Rule 11): regex, string match, name lookup → always consider boundary conditions
- DB schema change? Generate Alembic migration:
bash
cd backend && alembic revision --autogenerate -m "describe change"
Phase 5: Verify + Review
5A: Regression Verification (Rule 5)
| Category |
Checks |
| Functional |
Unit tests + integration + API + E2E + manual |
| Performance |
No N+1 queries, no resource leaks, no response time increase |
| Concurrency |
Thread-safe shared state, atomic operations, no race conditions |
Test the entire impact chain (L0-L3), not just the original bug.
5B: Runtime Deployment Verification (Rule 6)
| Step |
Action |
Evidence |
| 1 |
Clear Python bytecode cache |
__pycache__ removed |
| 2 |
Restart backend service |
PID changed from X to Y |
| 3 |
Health check passes |
/docs returns 200 |
| 4 |
Exercise the fixed code path |
Request triggers fixed logic |
If NOT deployed → restart and re-verify before proceeding.
5C: Bug Summary + Code Review (Rule 9)
## Bug Summary [BUG-XXX]
- **Symptom**: [one-sentence user-visible problem]
- **Root Cause**: [one-sentence actual cause]
- **Fix**: [one-sentence fix description]
- **Files Modified**: [file1.py, file2.ts]
- **Severity**: P0/P1/P2
Output Bug Summary → run code-review skill → if review finds issues → fix → re-verify
Stop condition: Code review clean + regression passed + deployment verified + original bug fixed.
Special Checks
| Bug Type |
Key Checks |
| API Bug |
Frontend → API → Schema → Service → DB chain; field completeness |
| DB Migration |
Model changed → alembic revision --autogenerate; no migration = schema drift |
| System-level |
Draw E2E chain; define handshake evidence per edge; insert probes first |
| Cross-Surface |
Shared artifact → identify contract → consumer list → regression matrix |
Phase 6: Knowledge Deposit + Self-Reflection
6.1 Update Knowledge Files (Rule 9)
| File |
When to Update |
references/bug-records.md |
Every fix (project history) |
references/bug-patterns.md |
New pattern / new fix strategy (universal) |
references/blind-spots.md |
New blind spot discovered |
6.2 Self-Reflection (Rule 9)
| Dimension |
Score (1-5) |
Evidence |
| First-time correctness |
[1-5] |
Did the fix work on first attempt? |
| Scope accuracy |
[1-5] |
Did I find all affected areas? |
| Minimal change |
[1-5] |
Was the change as small as possible? |
| Side effect prediction |
[1-5] |
Did I predict all side effects? |
| Root cause depth |
[1-5] |
Did I fix root cause, not symptom? |
| Total |
[/25] |
|
| Issue |
What Happened |
Why I Missed It |
Prevention |
|
|
|
|
Regression Autopsy (when fix introduced a regression)
- **Original Bug**: [what was being fixed]
- **New Bug Introduced**: [what broke]
- **Why I didn't predict it**: [blind spot]
- **Classification**: [missed consumer / contract violation / edge case / ...]
Domain-Specific Checks
| Bug Type |
Key Checks |
| Backend/API |
Schema drift, timeout/retry, transactions, N+1, connection pool, ORM lazy loading |
| Frontend/UI |
State (useEffect deps, unmount), race conditions, CORS, hydration, overflow/Portal |
| System-level |
Cross-layer chain, async/streaming, IPC, routing |
| Framework |
Read source code first (Rule 7), verify assumptions with tests |
| AI/LLM |
Tool binding modes, simulated vs native, streaming, token limits |
Skill Delegation
| Trigger |
Delegate To |
| Need new API endpoint |
fullstack-developer |
| UI fix needed |
frontend-design |
| Schema change needed |
database-migrations |
| After fix (mandatory) |
code-review |
Anti-Patterns (FORBIDDEN)
| Forbidden |
Correct |
| Fix without RCA |
Hypothesis ladder first |
| Single hypothesis then fix |
List 3-5 hypotheses, verify each |
| Fix UI bug by code reading alone |
Get runtime evidence first (Rule 8) |
| Skip consumer list for shared code |
Fill consumer list first |
| Tests pass but server runs old code |
Clear cache + restart + verify fix is live (Rule 6) |
| Fix code but ignore corrupted data |
Assess data impact + repair if needed |
| Trust framework docs blindly |
Read source code or run tests (Rule 7) |
| Fix one copy, miss the duplicate |
Grep function name; check both Path A and Path B |
| Pattern match without boundary check |
Add word boundaries / anchors / exact match (Rule 11) |
| Model changed but no migration |
Run alembic revision --autogenerate |
| Use full workflow for a typo |
Use Quick Fix path (Phase 0 Trivial tier) |
| Skip self-reflection |
Must score, analyze, and learn |
Final Checklist
Core (Standard + Complex tiers)
| # |
Check |
Phase |
| 1 |
Severity (P0-P3) + Tier (Trivial/Standard/Complex) classified |
0 |
| 2 |
Root cause passes 4 gates |
2A |
| 3 |
Bug pattern library + records searched |
2B |
| 4 |
Impact chain traced (code+data+time+event) |
2C |
| 5 |
Similar issue scan completed |
2D |
| 6 |
Scope passes 5 gates (incl. duplicate scan) |
3A |
| 7 |
Side effect prediction + blind spot check |
3A |
| 8 |
Regression verification ALL passed (L0-L3) |
5A |
| 9 |
Runtime deployment verified |
5B |
| 10 |
Bug Summary output + code-review passed |
5C |
| 11 |
Knowledge files updated |
6.1 |
| 12 |
Self-reflection completed |
6.2 |
| 13 |
If DB model changed: Alembic migration generated |
5 |
| 14 |
User confirmed fix + no new bugs |
Final |
Trivial Tier Checklist (Quick Fix path only)
| # |
Check |
Status |
| 1 |
Fix applied and tested (lint/test/manual) |
[ ] |
| 2 |
Bug record entry added |
[ ] |
| 3 |
No behavioral change introduced |
[ ] |
OpenClaw Project Context
Architecture Map
backend/app/
├── api/v1/ # FastAPI routes (agents, auth, chat, skills, tools, profile)
├── core/
│ ├── graph/ # LangGraph StateGraph (agent_graph, nodes/llm_node, tool_node, prepare_node)
│ ├── langchain/ # LangChain tools (tools.py, shell_tool.py, e2b_tools.py)
│ ├── mcp/ # MCP server integration (pool.py)
│ ├── database.py # SQLAlchemy async engine
│ └── security.py # JWT auth
├── models/ # SQLAlchemy ORM models (agent, tool, user, skill)
├── schemas/ # Pydantic request/response schemas
├── services/ # Business logic (agent_executor, chat_service, tool_call_parser, ...)
├── middleware/ # Request logging, audit, error handling
└── main.py # FastAPI app entry
frontend/src/
├── features/ # Feature modules (chat, settings, admin, knowledge, skills, agents)
│ ├── chat/ # ChatPageV2, MessageRenderer, ToolCallCard, SkillExecutionInline
│ └── ...
├── components/ui/ # shadcn/ui style components (dialog, switch, checkbox)
├── hooks/ # React hooks (useChatStream — SSE event handling)
├── store/ # Zustand state management
└── lib/ # API client (api-client.ts), markdown utils
Tech Stack
| Layer |
Technology |
| Backend |
FastAPI + Python 3.11+ |
| ORM |
SQLAlchemy 2.0 (async) |
| DB |
PostgreSQL (asyncpg) or MySQL (aiomysql) |
| Migrations |
Alembic (backend/alembic/) |
| Cache |
Redis |
| AI |
LangChain 0.3.x + LangGraph 0.4.x |
| Vector DB |
ChromaDB |
| Frontend |
React 18 + Vite + TypeScript |
| UI |
Radix UI + Tailwind CSS |
| State |
Zustand + TanStack Query |
| Tests |
pytest (backend), Vitest (frontend) |
| Deploy |
Docker Compose, supports PyInstaller desktop build |
High-Risk Bug Zones
Backend Hot Zones
| Zone |
Files |
Why It's High-Risk |
| Simulated Tool Call Parsing |
services/tool_call_parser.py, core/graph/nodes/llm_node.py |
Regex-based; dual implementations; multi-arg edge cases |
| Agent Executor |
services/agent_executor.py |
3000+ LOC; native + simulated modes; complex streaming |
| Tool Argument Remapping |
core/graph/nodes/tool_node.py |
LLM wrong param names → alphabetical guess |
| LLM Streaming (httpx) |
services/llm_manager.py |
Reasoning model fallback; SSE; reasoning_content |
| MCP Tool Integration |
core/mcp/pool.py, services/tool_service.py |
MCP lifecycle; command vs HTTP; timeout |
| Skill Runtime |
services/skill_executor.py, services/skill_service.py |
Script exec; env var injection; enhanced vs local |
| Chat Streaming |
services/chat_service.py |
SSE events; client disconnect; async save |
| Memory System |
services/unified_memory_manager.py |
L1/L2; embedding scoring; slow queries |
Frontend Hot Zones
| Zone |
Files |
Why It's High-Risk |
| SSE Chat Stream |
hooks/useChatStream.ts |
Event parsing; reconnection; reasoning_content |
| Tool Call Rendering |
features/chat/components/ToolCallCard.tsx |
Dynamic display; error states; loading |
| Skill Execution UI |
features/chat/components/SkillExecutionInline.tsx |
Inline status; progress; error display |
| Markdown Renderer |
features/chat/components/MarkdownRenderer.tsx |
Nested code fences; special chars; XSS |
| Agent Editor |
features/agents/AgentEditorPage.tsx |
Complex form state; tool/skill/KB associations |
| Zustand Store |
store/ |
State updates not re-rendering if reference unchanged |
Two Code Paths for Agent Execution
Path A: Direct Executor (most common)
chat API → chat_service → agent_executor.py → tool_call_parser.py → tool execution
Path B: StateGraph (LangGraph)
chat API → chat_service → agent_graph.py → llm_node.py → tool_node.py → tool execution
When fixing anything in Path A, always check Path B for the same issue (and vice versa).
Known Duplicate Implementations
| Function / Feature |
Primary Location |
Known Alternate Location |
parse_simulated_tool_calls |
services/tool_call_parser.py |
core/graph/nodes/llm_node.py |
| Tool loading / binding |
services/agent_executor.py |
core/graph/agent_graph.py |
| Token counting |
services/token_counter.py |
May have inline counting in agent_executor.py |
| Memory management |
services/unified_memory_manager.py |
core/graph/nodes/prepare_node.py |
OpenClaw Common Framework Pitfalls (Rule 7)
| Area |
Assumption |
Reality |
| Config load order |
First file has priority |
Often last file wins (dict.update) |
| ORM lazy loading |
Relations auto-load |
Default lazy, causes N+1 |
| Async on Windows |
Same as Linux |
Windows uses ProactorEventLoop; run_dev.py forces SelectorEventLoop |
| Pydantic serialization |
model_dump() includes all |
exclude_unset=True changes behavior |
| LangChain tool binding |
All models support tools |
Reasoning models need simulated mode |
__pycache__ |
Python uses latest source |
Stale .pyc can persist across restarts |
Windows Development Environment Gotchas
| Issue |
Symptom |
Workaround |
| Path separators |
vs / |
Use pathlib.Path or os.path.join |
| asyncio event loop |
ProactorEventLoop default |
run_dev.py forces loop="asyncio" |
__pycache__ file locks |
Can't delete while running |
Kill process FIRST, then clean |
| Console encoding |
GBK/CP936 default |
sys.stdout.reconfigure(encoding='utf-8') |
| Playwright on Windows |
Browser launch may fail |
Runs in separate thread with own loop |
Verification Commands (OpenClaw)
Backend
cd backend
ruff check app/ # Lint
python -m pytest tests/ -v --tb=short # Unit tests
python -m pytest tests/test_specific.py -v # Specific test
alembic revision --autogenerate -m "check" # DB migration check
Frontend
cd frontend
npm run lint
npm run typecheck
npm test
npm run build
Backend Server Restart
Get-WmiObject Win32_Process -Filter "Name='python.exe'" | Where { $_.CommandLine -like "*run_dev*" }
Stop-Process -Id [PID] -Force
Get-ChildItem -Path "backend" -Recurse -Filter "__pycache__" -Directory | Remove-Item -Recurse -Force
Start-Process -FilePath "backendvenvScriptspython.exe" -ArgumentList "run_dev.py" -WorkingDirectory "backend"
Invoke-WebRequest -Uri "http://127.0.0.1:8000/docs" -UseBasicParsing -TimeoutSec 5
Reference Files
Living Data Files (update after every fix)
| File |
Purpose |
references/bug-records.md |
Project-specific bug history |
references/blind-spots.md |
Single source of truth for AI blind spot registry |
Pattern Libraries (domain knowledge)
| File |
Purpose |
references/bug-patterns.md |
Universal bug pattern library (11 categories) |
references/backend-patterns.md |
Backend issues (API, ORM, LLM integration, OpenClaw-specific) |
references/frontend-patterns.md |
Frontend issues (React hooks, race conditions, CORS) |
Detailed Guides
| File |
Purpose |
references/system-rca.md |
System-level RCA (cross-layer, multi-process bugs) |
references/regression-matrix.md |
Complete zero-regression verification matrix |
Skill Evolution
Update this skill when:
- Code review finds a bug that the workflow should have prevented
- A recurring bug class repeats across fixes
Prefer updating specific sections over adding new rules.
After updates, validate that the workflow is still coherent and not overly bureaucratic.