Skip to content

Llm Call

Make a single LLM API call (Tier 2 execution). Cheaper than spawning a full agent session. Supports freeform text or structured JSON output via output_schema (inline dict or named schema from work_buddy/llm/schemas/). Routes to Claude via 'tier' or to a local/remote OpenAI-compatible server (LM Studio, vLLM, Ollama) via 'profile'. Handles caching and cost tracking automatically.

MCP name: llm_call

Category: llm

Parameters

Name Type Required Description
cache_ttl_minutes int No Cache TTL in minutes. None=config default, 0=no cache.
max_tokens int No Max response tokens (default: 1024)
output_schema dict|str No JSON Schema for structured output. Pass a dict for inline schemas, or a string name to load from work_buddy/llm/schemas/.json. Omit for freeform text.
profile str No Named local/remote profile (e.g. 'local_general') declared under llm.profiles in config. Routes through the profile's backend instead of Anthropic. Mutually exclusive with 'tier'.
system str Yes System prompt
temperature float No Sampling temperature (default: 0.0)
tier str No Cloud model tier: 'haiku' (default if no profile given), 'sonnet', or 'opus'. Mutually exclusive with 'profile'.
user str Yes User message content