Dashboard — Costs tab¶
LLM cost / usage view with two complementary sources (per-call internal log + Claude Code transcripts), row-level backend filters, and the Anthropic rate-limit observation chip. The unified
llm_costs_querycapability reads both sources.
Details¶
LLM cost / usage view. Two complementary sources, picked via the toolbar:
- Internal log — every
LLMRunnerAPI call, written to<data_root>/agents/<session>/llm_costs.jsonlbywork_buddy.llm.cost.log_call. Captures cloud + local backends. - Claude Code — Claude Code's per-session JSONLs in
~/.claude/projects/, ingested into a SQLite cache at<data_root>/cache/claude_code_usage.db. Captures every Claude Code session on the machine, regardless of whether it touched work-buddy.
The two sources are complementary, not overlapping — work-buddy's runner calls go ONLY into the internal log; Claude Code sessions go ONLY into transcripts. Summing them under source=all is honest with no de-dup needed.
Surfaces¶
GET /api/costs?source={internal|claude_code|all}&...— read the dashboard read model. See Query params below. Legacysource=transcriptsstill routes toclaude_code.GET /api/costs/projects— list of canonical project names for the toolbar dropdown.GET /api/costs/rate-limits— most-recent per-model rate-limit observations (RPM / ITPM / OTPM headroom) for the toolbar chip.POST /api/costs/rescan— refresh the Claude Code cache (gated by read-only mode).- Capability
llm_costs_query— the primary programmatic surface. One call covers most cost questions: time windows (named or ISO range), grouping (project / model / session / day / tool), source filter, comparison-to-previous-window. See its parameter schema for details. - Capability
claude_code_usage_scan— trigger an incremental rescan (mutates state). - Capability
escalation_recent— per-tier LLM escalation observability records (logged separately at<data_root>/logs/escalations.log; pruned vialogs/escalationsregistered inpaths.PRUNERS).
/api/costs query params¶
All filters apply at row level in both aggregators so totals / by_day / by_model / by_task / sessions / etc. stay in sync with the toolbar — no client-side post-filtering of cards or charts.
project=<name>— substring match against the canonical project name (the chats-tab resolver collapses worktrees to their parent project, e.g.electricrag-fg-clep→electricrag).execution_mode={cloud|local}— only meaningful for the internal source;claude_coderows are always cloud.start_date=YYYY-MM-DD/end_date=YYYY-MM-DD— inclusive bounds.models=<csv>— comma-separated model allow-list. Trichotomy:- missing param → no filter (all rows)
models=(present, empty value) → match nothing (zero rows)models=a,b→ narrow to those models
The empty-vs-missing distinction is part of the contract: it's how the chip rail expresses "user de-selected every model" without silently falling back to all-time. Capability and operational callers must emit models= (with no value) when the intent is "return zero rows," not omit the param.
Reading the numbers¶
The internal source records cost only for cloud calls; local-LLM calls log estimated_cost_usd: 0.0 by design. Both sources price cloud calls against the canonical Anthropic table at work_buddy.llm.claude_code_usage.pricing (input + output + cache_read at 90% off + cache_creation at +25% premium). Cards split the Calls count into cloud N · local M so the cost number is unambiguous. Costs above $100 drop the cents and use a thousands separator.
Frontend (Costs tab)¶
Toolbar widgets, in order:
- Project select — populated from
/api/costs/projects. Reuses the chats-tab canonical project resolver so worktrees collapse to their parent. - Activity pills — only visible when
project=work-buddy. Switches betweenall / claude_code / programmatic / api / local;apiandlocalthread anexecution_modefilter to the backend. - Date range pill — translates to
start_date. - Model filter chips — grouped by family (currently a single
Claudefamily). Click toggles a chip; alt/shift-click solos to that one. Family pill toggles the whole group.Resetlink appears when narrowed. De-selecting every chip narrows to zero rows (matches the trichotomy above). - Rate-limit chip — shows the most-restrictive headroom across recently-observed Anthropic models; click to expand a per-model popover. Backed by
/api/costs/rate-limits. The headers describe burn rate (token-bucket replenishment), not session-total quota — they show how close you are to being throttled right now, not how much you've spent overall.
The sessions table shows an active dot next to sessions whose last activity is within ACTIVE_WINDOW_MINUTES. The .wb-active-dot CSS class is reusable — the Chats tab uses the same one.