Process Backlog¶
Orchestrates the full backlog processing pipeline: segment Running Notes into threads, route each thread to its destination with user confirmation, and rewrite Running Notes with only the items that remain open.
Workflow name: process-backlog
Execution: main
Override not allowed
Steps¶
| # | ID | Name | Type | Depends on |
|---|---|---|---|---|
| 1 | extract |
Extract Running Notes section from journal | code |
|
| 2 | segment |
Segment Running Notes into coherent threads | reasoning |
extract |
| 3 | analyze-similarity |
Embed threads and cluster into information units | code |
segment |
| 4 | review-clusters |
Review clusters and apply merge/split decisions | reasoning |
analyze-similarity |
| 5 | route |
Route information units to destinations with user confirmation | reasoning |
review-clusters |
| 6 | rewrite |
Rewrite Running Notes with only open items | code |
route |
Step instructions¶
extract¶
(main)
Read the target journal file and extract the Running Notes section.
# Read journal file
journal_path = "<vault-root>/journal/2026-04-02.md" # resolve from config vault_root
# Extract Running Notes section (between header and % RUNNING END or next heading)
# Store: raw_text, full_file_content (needed for rewrite phase)
Handle edge cases: - No Running Notes section → nothing to process, complete workflow - Empty Running Notes → nothing to process - Very short Running Notes (< 3 items) → ask user if they really want to run the full pipeline, or just handle inline
Output: raw_text (the Running Notes content) and full_file_content (the entire journal file, needed for rewrite).
segment¶
(subagent)
Spawn a subagent to run workflows/daily-journal/segment-notes.md.
Subagent prompt must include:
- The raw_text from the extract phase
- The pool of pre-generated thread IDs
- The segmentation rules from the workflow spec
Output: tagged text + thread list + validation result.
If validation fails, the subagent should fix and re-validate (up to 3 attempts). If it still fails, return the error and the main agent handles it.
analyze-similarity¶
(main, code)
Embed all thread summaries and compute multi-signal similarity to cluster threads into coherent information units.
from work_buddy.journal_backlog.similarity import analyze_threads, generate_agent_context
from work_buddy.obsidian.smart import check_ready
# Verify Smart Environment is loaded (embeddings require it)
status = check_ready()
if not status.get("ready"):
from work_buddy.obsidian.smart import wait_until_ready
wait_until_ready(timeout_seconds=120)
# Run the full analysis pipeline
# manifest_path is the thread_manifest.jsonl from the segment step
result = analyze_threads(manifest_path, is_journal=True)
# result contains: clusters, merges, top_pairs, report, threads
Output: result dict containing clusters, the agent report (markdown), and the structured agent context (JSON).
review-clusters¶
(main, reasoning)
Agentic step. The agent reviews clustering results and confirms merge/split decisions with the user. Behavioral instructions (confidence thresholds, presentation format, merge/split procedures) are in the slash command, not here.
Applying decisions (data contracts): - MERGE clusters: Combine thread lists, concatenate raw text, union tags - SPLIT cluster: Move specified threads to their own new unit or an existing cluster - MOVE singleton: Add to specified cluster
Output: Final list of information units (merged/cleaned clusters), each with combined raw text, unified tags, and a summary. These become the routing input items for the next step.
route¶
(main)
Agentic step. Convert reviewed information units into routing input items and invoke workflows/routing/route-information.md. Behavioral instructions (proposal generation, confidence levels) are in the slash command, not here.
Before proposing destinations, call mcp__work-buddy__wb_run("project_list") to get the list of known projects. Only propose work/projects/<slug>/ destinations for slugs that appear in the project list — do not invent project paths.
Each information unit becomes one routing input item:
{
"id": "unit_0",
"raw_text": "Combined text from all threads in this unit",
"source": "journal/<date>/RunningNotes",
"thread_ids": ["t_a3f8c1", "t_b2c3d4"],
"agent_summary": "Best guess at what this unit is about",
"tags": ["#projects/my-project", "#research/my-topic"],
"proposed_type": "task | consideration | reference | admin | personal | delete | unknown",
"proposed_destination": "best guess at where it should go",
"confidence": "high | medium | low",
"staleness_note": "any indication this might be completed or stale"
}
The routing workflow handles user interaction, confirmation, and execution. It returns a routing record.
rewrite¶
(main, consent-gated)
Agentic step. The agent rewrites Running Notes with only remaining items after routing. Behavioral instructions (presentation format, consent requirements) are in the slash command, not here.
Rewrite rules:
1. Items with action delete or route: remove from Running Notes
2. Items with action skip: keep in Running Notes
3. Items with action split: keep un-routed portions, remove routed portions
This is a filesystem write and is consent-gated.
Parallelization notes¶
- Segmentation: single subagent (needs full context)
- Similarity analysis: single main agent (embeds all threads in one batch, ~35 threads takes ~10s)
- Cluster review: single main agent (interactive with user)
- Routing: sequential units in main agent, but sub-agents can pre-compute routing proposals in parallel
- Rewrite: single main agent operation
For very large backlogs (50+ threads), consider pre-computing all routing proposals via parallel sub-agents.
Cost management¶
- First run on a large backlog will be expensive (full segmentation + many routing interactions)
- Subsequent runs process only new items since last run — much cheaper
- Track last-processed date in session metadata to scope future runs
- For enormous backlogs (100+ items), suggest processing in chunks across sessions
DAG state persistence¶
The DAG state is saved to agents/<session>/workflows/process-backlog.json. If the session is interrupted, the workflow can resume from the last completed task. The segmentation result and routing record are stored as task results in the DAG.