Llm Submit¶
Asynchronously submit an llm_call for background execution. Returns immediately with an operation_id; the sidecar's retry sweep invokes llm_call with your params and messages the originating session on completion. Use when local inference latency (tens of seconds) would block the caller unnecessarily. For synchronous bounded calls use llm_call. Cloud tier calls are already fast — no point submitting them; profile is therefore required.
MCP name: llm_submit
Category: llm
Parameters¶
| Name | Type | Required | Description |
|---|---|---|---|
cache_ttl_minutes |
int |
No | Cache TTL in minutes. None=config default, 0=no cache. |
max_tokens |
int |
No | Max response tokens (default: 1024) |
output_schema |
dict|str |
No | JSON Schema for structured output. Pass a dict for inline schemas, or a string name to load from work_buddy/llm/schemas/ |
profile |
str |
Yes | Named local/remote profile (e.g. 'local_general'). Required — submits are for local profiles only. |
system |
str |
Yes | System prompt |
temperature |
float |
No | Sampling temperature (default: 0.0) |
user |
str |
Yes | User message content |