MCP Server Import Discipline¶
Critical safety constraint: why heavy library imports in capability callables deadlock the MCP server, and the correct pattern to avoid it
Details¶
Rule¶
The MCP server process must never import heavy compute libraries in capability callables. This includes numpy, rank_bm25, sentence-transformers, and sqlite3 (via ir.store).
All heavy compute goes through the embedding service HTTP API (localhost:5124).
Why: the deadlock mechanism¶
The MCP server uses asyncio.to_thread() to dispatch capability callables to a thread pool. If a callable does a deferred import of a heavy module (e.g., from work_buddy.ir.engine import search), the import triggers Python's per-module import lock. The main thread (running the asyncio event loop) may also need import locks for its own operations. Result: permanent deadlock.
Step-by-step¶
1. Claude calls `wb_run("context_search", ...)`
2. Gateway submits callable to thread pool via asyncio.to_thread()
3. Worker thread starts executing the callable
4. Callable hits: from work_buddy.ir.engine import search
5. This triggers loading numpy, rank_bm25, sqlite3 — heavy C extensions
6. Python import system acquires per-module locks for each module in the chain
7. Main thread's event loop needs one of those locks (for internal lazy imports)
8. DEADLOCK: worker holds locks, waits for event loop; event loop waits for worker
This was discovered and fixed on April 6, 2026. The original symptom was context_search hanging for 30+ seconds on first request — debug checkpoints confirmed execution reached the function body but never completed the ir.engine import.
The correct pattern¶
All heavy compute runs in the embedding service (work_buddy/embedding/service.py), which runs in its own process and already imports numpy/rank_bm25/sentence-transformers:
/ir/searchendpoint — runs BM25 scoring, dense retrieval, and RRF fusion/ir/indexendpoint — builds/checks the search index
The MCP server's _ir_search_dispatch and _ir_index_dispatch call client.ir_search() and client.ir_index() — lightweight HTTP requests via urllib, no heavy imports.
The _IN_SERVICE flag¶
The _IN_SERVICE flag in ir/dense.py lets the embedding service call models directly (avoiding HTTP self-calls) while external callers still use the HTTP API.
Safe vs unsafe imports in capability callables¶
| Safe | Unsafe |
|---|---|
urllib, json, pathlib |
numpy, rank_bm25 |
work_buddy.config, work_buddy.paths |
work_buddy.ir.store, work_buddy.ir.engine |
| HTTP calls to embedding service | sqlite3 (loaded by ir.store) |
work_buddy.obsidian.bridge |
sentence_transformers |
Key files¶
work_buddy/mcp_server/registry.py— capability registration (deadlock warnings in_build_registry()and_context_capabilities())work_buddy/embedding/service.py— the correct home for heavy computework_buddy/ir/dense.py—_IN_SERVICEflagwork_buddy/mcp_server/context_wrappers.py— gateway-callable wrappers following the correct pattern
Sibling rule: don't block the event loop¶
The deadlock story above is about heavy imports inside threaded callables. There is a closely related failure mode that does not deadlock but is just as bad: synchronous blocking calls inside an async def tool handler in work_buddy/mcp_server/tools/gateway.py. FastMCP runs the event loop single-threaded, so any sync call that takes more than a couple of seconds freezes every other request — including the 2-second /health poll from the sidecar supervisor. When that poll times out, the supervisor marks mcp_gateway unhealthy and restarts it, which drops every Claude Code SSE stream.
This was discovered on 2026-04-18: wb_search called registry.search_registry(...) directly instead of through asyncio.to_thread. The first call after a cold gateway boot materialized the registry (~19s of tool probes + knowledge index warm + Obsidian HTTP probe with a 10s timeout) on the event loop, blocked /health, and triggered a cascading auto-restart 2–3 minutes after every gateway start.
Rule¶
Any async def tool handler registered on FastMCP must await asyncio.to_thread(...) around any call that touches:
- the registry (
registry.search_registry,registry.get_entry, anything that transitively callsget_registry()) - the knowledge store
- the filesystem beyond trivial reads
- an HTTP probe or another service over HTTP
If you want to read arguments and route, that's fine inline. If you want to do work, go through to_thread.
Diagnostic recipe¶
To verify a new handler doesn't block the event loop:
- Invalidate the registry on a running gateway:
mcp__work-buddy__wb_run("mcp_registry_reload") - In another terminal, hammer
/healthat ~20 Hz:while true; do curl -sm1 -o/dev/null -w "%{time_total}\n" http://localhost:5126/health; done - Call your new tool once.
/healthlatency must stay under ~100ms throughout. A spike into seconds means you're blocking the event loop.
Defenses already in place¶
- Background warm-start:
main_http()inmcp_server/server.pyfires a daemon thread that callsget_registry()immediately after bind. By the time the first real request lands, the registry is already built. This hides latency — but it does not excuse skippingasyncio.to_threadon new handlers, becausemcp_registry_reloadcan invalidate at any time. - Slow-rebuild warning:
_build_registry()emits aWARNINGto the main log when total rebuild exceeds 5s, with a per-section breakdown (tool_probes,cap:*,knowledge_index). Check the sidecar log if/healthflakes — a noisy section there usually points at the culprit (often the Obsidian probe's 10s HTTP timeout when Obsidian is closed). - Fast-path socket check:
compat._find_pids_on_portshortcuts to a socket probe before shelling out to PowerShell (Get-NetTCPConnection). Pinned by a regression test intests/unit/test_compat_port_cleanup.py— sidecar restart is ~15s on Windows, not 25–30s.