Llm Call¶

Make a single LLM API call (Tier 2 execution). Cheaper than spawning a full agent session. Supports freeform text or structured JSON output via output_schema (inline dict or named schema from work_buddy/llm/schemas/). Routes to Claude via 'tier' or to a local/remote OpenAI-compatible server (LM Studio, vLLM, Ollama) via 'profile'. Handles caching and cost tracking automatically.

MCP name: llm_call

Category: llm

Parameters¶

Name	Type	Required	Description
`cache_ttl_minutes`	`int`	No	Cache TTL in minutes. None=config default, 0=no cache.
`max_tokens`	`int`	No	Max response tokens (default: 1024)
`output_schema`	`dict\|str`	No	JSON Schema for structured output. Pass a dict for inline schemas, or a string name to load from work_buddy/llm/schemas/.json. Omit for freeform text.
`priority`	`str`	No	Local-inference admission priority for the broker: 'interactive', 'workflow' (default), or 'background'. Only applies to the local 'profile' path; ignored for cloud 'tier' (Anthropic isn't brokered). Lets background work yield to interactive work on the same LM Studio profile.
`profile`	`str`	No	Named local/remote profile (e.g. 'local_general') declared under llm.profiles in config. Routes through the profile's backend instead of Anthropic. Mutually exclusive with 'tier'.
`system`	`str`	Yes	System prompt
`temperature`	`float`	No	Sampling temperature (default: 0.0)
`tier`	`str`	No	Cloud model tier: 'haiku' (default if no profile given), 'sonnet', or 'opus'. Mutually exclusive with 'profile'.
`user`	`str`	Yes	User message content