Session Retrospective¶
Switch to developmental mode and critique/debug this session's execution — then fix what you find
When to use¶
When the user invokes /wb-dev-retro or asks for a session retrospective, post-session critique, or to debug what went wrong during an operational session
Slash command: /wb-dev-retro
Directions¶
Switch from operational to developmental mode. Critique and debug this session's execution, then fix what you find.
The design rationale behind operational vs developmental agents: operational agents get things done (workflows, tasks, journals). Developmental agents improve the system itself (code, prompts, capabilities). The session retrospective bridges the two — the agent that experienced the friction diagnoses and fixes it, preserving first-person context that would be lost in a handoff to a separate evaluator.
Prerequisite: You must have done operational work in this session (morning routine, task triage, journal update, context collection, etc.). If you haven't, tell the user there's nothing to retrospect on.
Step 1: Critique¶
Review everything that happened in this session. Be specific, be harsh, and cite evidence (quote the call, the response, the output). You are debugging the system — not just the code, but the behavior, the prompts, the context engineering, the workflow design, and your own decisions.
Look for:
Capability bugs - Calls that returned wrong, irrelevant, or excessively broad results - Missing capabilities (you had to write raw Python because nothing existed in the gateway) - Silent failures or unhelpful error messages - Redundant calls (same data fetched multiple ways)
Prompt and context issues - Slash commands that didn't give you enough context, or gave you the wrong context - Workflow steps that were awkward, out of order, or unnecessary - Violations of just-in-time retrieval: context loaded too early (bloat) or too late (missed) - Unclear data contracts between workflow steps
Agent behavior issues (debug your own reasoning) - Wrong approach — a better-contextualized agent would have avoided this path - Repeated mistakes within the session - Missed signals that were obvious in retrospect - Over-engineering or under-engineering
Output quality - Journal entries, tasks, briefings, or user-facing summaries that were poorly structured, too verbose, or missed the point
Format as:
## Session Critique
### Critical (broke something or produced wrong output)
1. **[Category] Short title**: What happened. What should have happened. Evidence.
### Friction (slowed things down or degraded quality)
1. **[Category] Short title**: What happened. What should have happened.
### Minor (polish)
1. **[Category] Short title**: What happened. What should have happened.
Present this to the user. Ask if they experienced friction you missed or if any of your critiques are off-base.
Step 2: Triage (with the user)¶
Present the full critique and let the user decide what to act on. For each issue, offer these options:
- Fix now — code change, prompt edit, config tweak. Do it in this session.
- Debug first — root cause is unclear. Reproduce the issue, investigate, then fix.
- Create a task — needs deeper work or a fresh session. Use
/wb-task-handoffto capture the issue with full context so a future agent can pick it up. - Skip — not worth fixing, or not actually a problem.
The user drives this. Don't prescribe how many to fix.
Step 3: Do the work¶
For each issue the user selected:
Fix now: Read the relevant files, make the change, test if possible (re-run the call that failed), briefly note what you changed.
Debug first: Reproduce the problem — re-run the call or workflow step. Narrow the root cause: is it the capability code, the prompt, the workflow, the data? Then fix and test. If you learn something non-obvious, document it in the relevant knowledge store unit.
Create a task: Use /wb-task-handoff. The handoff should include the specific failure (with evidence from this session), your root cause hypothesis, the files involved, and what you already tried.
Step 4: Close out¶
If you made code changes: 1. Review them holistically — do they introduce new issues? 2. Summarize: what was fixed, what was handed off, what was skipped 3. Call out systemic patterns if you see them (e.g., "three issues were all about context bloat — the JIT pattern needs enforcement across workflows")
What NOT to do¶
- Boil the ocean — don't try to fix everything
- Stay abstract — "search should be better" is not actionable; "context_search returned 47 results including binary files, needs a file-type filter" is
- Bikeshed — don't spend 30 min on naming when a capability is broken
- Scope-creep into features — this is about fixing friction, not adding new features
- Self-congratulate — don't list things that went well; the point is improvement