Llm Submit¶

Asynchronously submit an llm_call for background execution. Returns immediately with an operation_id; the sidecar's retry sweep invokes llm_call with your params and messages the originating session on completion. Use when local inference latency (tens of seconds) would block the caller unnecessarily. For synchronous bounded calls use llm_call. Cloud tier calls are already fast — no point submitting them; profile is therefore required.

MCP name: llm_submit

Category: llm

Parameters¶

Name	Type	Required	Description
`cache_ttl_minutes`	`int`	No	Cache TTL in minutes. None=config default, 0=no cache.
`max_tokens`	`int`	No	Max response tokens (default: 1024)
`output_schema`	`dict\|str`	No	JSON Schema for structured output. Pass a dict for inline schemas, or a string name to load from work_buddy/llm/schemas/.json. Omit for freeform.
`priority`	`str`	No	Local-inference admission priority for the broker: 'interactive', 'workflow' (default), or 'background'. Background submits should pass 'background' so they yield to interactive work on the same LM Studio profile.
`profile`	`str`	Yes	Named local/remote profile (e.g. 'local_general'). Required — submits are for local profiles only.
`system`	`str`	Yes	System prompt
`temperature`	`float`	No	Sampling temperature (default: 0.0)
`user`	`str`	Yes	User message content