Skip to content

MCP CPU Spike #4754

@derekhiggins

Description

@derekhiggins

MCP CPU Spike

Summary

When making MCP calls through the responses API, the llamastack server process CPU usage spikes to 100% and remains there indefinitely, even after the request completes.

Environment

  • LLamaStack version: main branch
  • Python version: 3.12

Steps to Reproduce

  1. Start llamastack server:
  2. Verify CPU usage is idle (0-1%)
  3. Make an MCP call via responses API:
  4. Monitor CPU usage with top

Expected Behavior

CPU usage should return to idle levels (0-1%) after the MCP request completes.

Actual Behavior

CPU usage spikes to 100% and stays there indefinitely:

# Before MCP call - idle
3176764 derekh    20   0 4196036 442736 137316 S   0.0   1.4   0:06.55 llama stack run

# After MCP call - stuck at 100%
3176764 derekh    20   0 4422628 448496 137444 R  99.7   1.4   0:34.23 llama stack run

Root Cause

It looks like the issue is caused by the MCP session caching mechanism (MCPSessionManager) that
was added to optimize performance by avoiding redundant tools/list calls (fix for #4452).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions