Text-segmenter SubCall¶
Generic text-segmentation SubCall. Splits captured prose into distinct matters. Used by inline-capture (right-click 'Send to agent') to detect multi-matter selections. Reusable for any future singular-input pipeline (per-message email triage, etc.).
Details¶
Generic text-segmentation SubCall built on the decomposed-judgment framework. Splits captured prose into distinct matters — coherent subjects the user might think of as one thing each.
Where it lives¶
- Module:
work_buddy/clarify/text_segmenter.py - SubCall declaration:
TEXT_SEGMENTER_SUBCALL - Public entry:
segment_into_matters(text, *, hint='', item_id='', short_text_bypass_chars=120, runner=None) - Helper:
_significant_newline_count(text)— counts newlines NOT immediately followed by a bullet marker (-,*,+, or1.-style numbered, with optional leading whitespace). - Config:
triage.text_segmenter.{tier_chain,max_tokens,temperature,cache_ttl_minutes,max_segments}inclarify/config.py:TRIAGE_DEFAULTS.
Output shape¶
A list of dicts, each carrying:
- start_char: integer offset (inclusive) into the input text
- end_char: integer offset (exclusive)
- label: short noun-phrase summarising what the matter is about
- text: the input slice for this matter (text[start:end])
ALWAYS returns at least one segment when the input has any non-whitespace content (passthrough on segmenter failure, never fragment the user's input).
Bypass rules¶
The LLM call is skipped when ALL of these hold; otherwise the segmenter runs.
- Empty / whitespace-only text →
[](caller treats as no work). - Short single-block bypass: when
len(text) < short_text_bypass_charsAND the text contains ≤1 significant newline, return a single-segment passthrough labeled(short capture). Both clauses matter independently: - Char-count alone misses short multi-matter captures:
"Email Bob about the report.\n\nRenew car insurance Friday."is 56 chars but is two distinct matters that the user expects to land as two separate threads. - Newline-count alone over-segments short bullet lists:
"Read research paper\n- Paper A\n- Paper B"has multiple newlines but is one matter; the bullet-aware count of significant newlines is 0, so it stays bypassed. - SubCall soft-fail (every tier exhausts) → single-segment passthrough labeled
(unsegmented).
Bias-toward-cohesion¶
The system prompt biases strongly toward 'one matter' to avoid false-splits. False-splits create user-visible thread fragmentation; false-merges absorb cleanly into the multi-action pattern downstream (the singular umbrella's render hoist). Mirrors project_picker's 'lean toward null when uncertain'.
Multiple ACTIONS on the same matter is NOT a split signal — Buy gift for Sarah's birthday on May 12 is ONE matter (the birthday) yielding two records (task + calendar event), but those records are about the same subject.
Split signals (require strong evidence): different topics with no semantic bridge, separating conjunctions (Also, Separately,), paragraph breaks introducing a new subject, distinct imperatives addressing unrelated work.
Post-parse validation¶
After the LLM returns a structured-output dict, _validate_and_normalize_segments:
- Drops entries with non-integer offsets, inverted ranges, or out-of-text offsets.
- Sorts by
start_charascending. - Drops segments overlapping with an earlier segment.
- Caps at
max_segments(default 6). - Coverage check: if the union of all segments covers less than 85% of the non-whitespace input characters, distrusts the segmentation entirely and returns
[](caller treats as passthrough). The model probably hallucinated boundaries. - Attaches
textslices.
What this segmenter does NOT subsume¶
The journal pipeline has its own segmenter (clarify/adapters/journal.py:_segment_with_escalation) with line-range output and post-parse semantic-validation-driven escalation. That escalation pattern doesn't fit SubCall today (SubCall escalates on backend errors / schema violations, not on caller-defined post-parse semantic checks). Migrating journal to this generic segmenter is future work after SubCall grows a validate_post_parse hook.
Relationship to the singular pattern¶
This SubCall is the upstream filter for pipelines/inline.py:inline_capture. Multi-matter captures route as N independent root threads (one per detected matter); single-matter captures whose verdict produces 2+ records spawn a singular umbrella with hoisted actions on the umbrella card. See threads/grouping for the singular pattern's render-side semantics.