Dashboard — Costs tab¶

LLM cost / usage view with two complementary sources (per-call internal log + Claude Code transcripts), row-level backend filters, and the Anthropic rate-limit observation chip. The unified llm_costs_query capability reads both sources.

Details¶

LLM cost / usage view. Two complementary sources, picked via the toolbar:

Internal log — every LLMRunner API call, written to <data_root>/agents/<session>/llm_costs.jsonl by work_buddy.llm.cost.log_call. Captures cloud + local backends.
Claude Code — Claude Code's per-session JSONLs in ~/.claude/projects/, ingested into a SQLite cache at <data_root>/cache/claude_code_usage.db. Captures every Claude Code session on the machine, regardless of whether it touched work-buddy.

The two sources are complementary, not overlapping — work-buddy's runner calls go ONLY into the internal log; Claude Code sessions go ONLY into transcripts. Summing them under source=all is honest with no de-dup needed.

Surfaces¶

GET /api/costs?source={internal|claude_code|all}&... — read the dashboard read model. See Query params below. Legacy source=transcripts still routes to claude_code.
GET /api/costs/projects — list of canonical project names for the toolbar dropdown.
GET /api/costs/rate-limits — most-recent per-model rate-limit observations (RPM / ITPM / OTPM headroom) for the toolbar chip.
POST /api/costs/rescan — refresh the Claude Code cache (gated by read-only mode).
Capability llm_costs_query — the primary programmatic surface. One call covers most cost questions: time windows (named or ISO range), grouping (project / model / session / day / tool), source filter, comparison-to-previous-window. See its parameter schema for details.
Capability claude_code_usage_scan — trigger an incremental rescan (mutates state).
Capability escalation_recent — per-tier LLM escalation observability records (logged separately at <data_root>/logs/escalations.log; pruned via logs/escalations registered in paths.PRUNERS).

/api/costs query params¶

All filters apply at row level in both aggregators so totals / by_day / by_model / by_task / sessions / etc. stay in sync with the toolbar — no client-side post-filtering of cards or charts.

project=<name> — substring match against the canonical project name (the chats-tab resolver collapses worktrees to their parent project, e.g. electricrag-fg-clep → electricrag).
execution_mode={cloud|local} — only meaningful for the internal source; claude_code rows are always cloud.
start_date=YYYY-MM-DD / end_date=YYYY-MM-DD — inclusive bounds.
models=<csv> — comma-separated model allow-list. Trichotomy:
missing param → no filter (all rows)
models= (present, empty value) → match nothing (zero rows)
models=a,b → narrow to those models

The empty-vs-missing distinction is part of the contract: it's how the chip rail expresses "user de-selected every model" without silently falling back to all-time. Capability and operational callers must emit models= (with no value) when the intent is "return zero rows," not omit the param.

Reading the numbers¶

The internal source records cost only for cloud calls; local-LLM calls log estimated_cost_usd: 0.0 by design. Both sources price cloud calls against the canonical Anthropic table at work_buddy.llm.claude_code_usage.pricing (input + output + cache_read at 90% off + cache_creation at +25% premium). Cards split the Calls count into cloud N · local M so the cost number is unambiguous. Costs above $100 drop the cents and use a thousands separator.

Frontend (Costs tab)¶

Toolbar widgets, in order:

Project select — populated from /api/costs/projects. Reuses the chats-tab canonical project resolver so worktrees collapse to their parent.
Activity pills — only visible when project=work-buddy. Switches between all / claude_code / programmatic / api / local; api and local thread an execution_mode filter to the backend. Rendered by the shared wbRenderFilters widget (single-select segmented).
Date range pill — translates to start_date.
Model filter chips — the shared wbRenderFilters widget (grouped tristate multi-select), grouped by family (currently a single Claude family). Click toggles a chip; alt/shift-click solos to that one. Family pill toggles the whole group. Reset link appears when narrowed. De-selecting every chip narrows to zero rows (matches the trichotomy above).
Rate-limit chip — shows the most-restrictive headroom across recently-observed Anthropic models; click to expand a per-model popover. Backed by /api/costs/rate-limits. The headers describe burn rate (token-bucket replenishment), not session-total quota — they show how close you are to being throttled right now, not how much you've spent overall.

The sessions table shows an active dot next to sessions whose last activity is within ACTIVE_WINDOW_MINUTES. The .wb-active-dot CSS class is reusable — the Chats tab uses the same one.