-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Milestone
Description
MCP CPU Spike
Summary
When making MCP calls through the responses API, the llamastack server process CPU usage spikes to 100% and remains there indefinitely, even after the request completes.
Environment
- LLamaStack version: main branch
- Python version: 3.12
Steps to Reproduce
- Start llamastack server:
- Verify CPU usage is idle (0-1%)
- Make an MCP call via responses API:
- Monitor CPU usage with
top
Expected Behavior
CPU usage should return to idle levels (0-1%) after the MCP request completes.
Actual Behavior
CPU usage spikes to 100% and stays there indefinitely:
# Before MCP call - idle
3176764 derekh 20 0 4196036 442736 137316 S 0.0 1.4 0:06.55 llama stack run
# After MCP call - stuck at 100%
3176764 derekh 20 0 4422628 448496 137444 R 99.7 1.4 0:34.23 llama stack run
Root Cause
It looks like the issue is caused by the MCP session caching mechanism (MCPSessionManager) that
was added to optimize performance by avoiding redundant tools/list calls (fix for #4452).
Reactions are currently unavailable