Skip to content

Process Backlog

Orchestrates the full backlog processing pipeline: segment Running Notes into threads, route each thread to its destination with user confirmation, and rewrite Running Notes with only the items that remain open.

Workflow name: process-backlog

Execution: main

Override not allowed

Steps

# ID Name Type Depends on
1 extract Extract Running Notes section from journal code
2 segment Segment Running Notes into coherent threads reasoning extract
3 analyze-similarity Embed threads and cluster into information units code segment
4 review-clusters Review clusters and apply merge/split decisions reasoning analyze-similarity
5 route Route information units to destinations with user confirmation reasoning review-clusters
6 rewrite Rewrite Running Notes with only open items code route

Step instructions

extract

(main)

Read the target journal file and extract the Running Notes section.

# Read journal file
journal_path = "<vault-root>/journal/2026-04-02.md"  # resolve from config vault_root

# Extract Running Notes section (between header and % RUNNING END or next heading)
# Store: raw_text, full_file_content (needed for rewrite phase)

Handle edge cases: - No Running Notes section → nothing to process, complete workflow - Empty Running Notes → nothing to process - Very short Running Notes (< 3 items) → ask user if they really want to run the full pipeline, or just handle inline

Output: raw_text (the Running Notes content) and full_file_content (the entire journal file, needed for rewrite).

segment

(subagent)

Spawn a subagent to run workflows/daily-journal/segment-notes.md.

Subagent prompt must include: - The raw_text from the extract phase - The pool of pre-generated thread IDs - The segmentation rules from the workflow spec

Output: tagged text + thread list + validation result.

If validation fails, the subagent should fix and re-validate (up to 3 attempts). If it still fails, return the error and the main agent handles it.

analyze-similarity

(main, code)

Embed all thread summaries and compute multi-signal similarity to cluster threads into coherent information units.

from work_buddy.journal_backlog.similarity import analyze_threads, generate_agent_context
from work_buddy.obsidian.smart import check_ready

# Verify Smart Environment is loaded (embeddings require it)
status = check_ready()
if not status.get("ready"):
    from work_buddy.obsidian.smart import wait_until_ready
    wait_until_ready(timeout_seconds=120)

# Run the full analysis pipeline
# manifest_path is the thread_manifest.jsonl from the segment step
result = analyze_threads(manifest_path, is_journal=True)

# result contains: clusters, merges, top_pairs, report, threads

Output: result dict containing clusters, the agent report (markdown), and the structured agent context (JSON).

review-clusters

(main, reasoning)

Agentic step. The agent reviews clustering results and confirms merge/split decisions with the user. Behavioral instructions (confidence thresholds, presentation format, merge/split procedures) are in the slash command, not here.

Applying decisions (data contracts): - MERGE clusters: Combine thread lists, concatenate raw text, union tags - SPLIT cluster: Move specified threads to their own new unit or an existing cluster - MOVE singleton: Add to specified cluster

Output: Final list of information units (merged/cleaned clusters), each with combined raw text, unified tags, and a summary. These become the routing input items for the next step.

route

(main)

Agentic step. Convert reviewed information units into routing input items and invoke workflows/routing/route-information.md. Behavioral instructions (proposal generation, confidence levels) are in the slash command, not here.

Before proposing destinations, call mcp__work-buddy__wb_run("project_list") to get the list of known projects. Only propose work/projects/<slug>/ destinations for slugs that appear in the project list — do not invent project paths.

Each information unit becomes one routing input item:

{
  "id": "unit_0",
  "raw_text": "Combined text from all threads in this unit",
  "source": "journal/<date>/RunningNotes",
  "thread_ids": ["t_a3f8c1", "t_b2c3d4"],
  "agent_summary": "Best guess at what this unit is about",
  "tags": ["#projects/my-project", "#research/my-topic"],
  "proposed_type": "task | consideration | reference | admin | personal | delete | unknown",
  "proposed_destination": "best guess at where it should go",
  "confidence": "high | medium | low",
  "staleness_note": "any indication this might be completed or stale"
}

The routing workflow handles user interaction, confirmation, and execution. It returns a routing record.

rewrite

(main, consent-gated)

Agentic step. The agent rewrites Running Notes with only remaining items after routing. Behavioral instructions (presentation format, consent requirements) are in the slash command, not here.

Rewrite rules: 1. Items with action delete or route: remove from Running Notes 2. Items with action skip: keep in Running Notes 3. Items with action split: keep un-routed portions, remove routed portions

This is a filesystem write and is consent-gated.

Parallelization notes

  • Segmentation: single subagent (needs full context)
  • Similarity analysis: single main agent (embeds all threads in one batch, ~35 threads takes ~10s)
  • Cluster review: single main agent (interactive with user)
  • Routing: sequential units in main agent, but sub-agents can pre-compute routing proposals in parallel
  • Rewrite: single main agent operation

For very large backlogs (50+ threads), consider pre-computing all routing proposals via parallel sub-agents.

Cost management

  • First run on a large backlog will be expensive (full segmentation + many routing interactions)
  • Subsequent runs process only new items since last run — much cheaper
  • Track last-processed date in session metadata to scope future runs
  • For enormous backlogs (100+ items), suggest processing in chunks across sessions

DAG state persistence

The DAG state is saved to agents/<session>/workflows/process-backlog.json. If the session is interrupted, the workflow can resume from the last completed task. The segmentation result and routing record are stored as task results in the DAG.