Skip to content

Llm Submit

Asynchronously submit an llm_call for background execution. Returns immediately with an operation_id; the sidecar's retry sweep invokes llm_call with your params and messages the originating session on completion. Use when local inference latency (tens of seconds) would block the caller unnecessarily. For synchronous bounded calls use llm_call. Cloud tier calls are already fast — no point submitting them; profile is therefore required.

MCP name: llm_submit

Category: llm

Parameters

Name Type Required Description
cache_ttl_minutes int No Cache TTL in minutes. None=config default, 0=no cache.
max_tokens int No Max response tokens (default: 1024)
output_schema dict|str No JSON Schema for structured output. Pass a dict for inline schemas, or a string name to load from work_buddy/llm/schemas/.json. Omit for freeform.
profile str Yes Named local/remote profile (e.g. 'local_general'). Required — submits are for local profiles only.
system str Yes System prompt
temperature float No Sampling temperature (default: 0.0)
user str Yes User message content