fix: MCP CPU spike by using context manager for session cleanup by jwm4 · Pull Request #4758 · llamastack/llama-stack

jwm4 · 2026-01-28T14:22:14Z

Summary

When making MCP calls through the responses API, the llama-stack server CPU usage could spike to 100% and remain there indefinitely, even after the request completes.

Root Cause

The issue occurs during MCP session cleanup in MCPSessionManager.close_all(). When tasks don't respond to cancellation, anyio's _deliver_cancellation loop can spin indefinitely, causing the CPU spike.

Solution

Added a configurable timeout (default 5 seconds) to the __aexit__ calls using anyio.fail_after(). If cleanup takes longer than the timeout, it's aborted to prevent the CPU spin.

Testing

Verified that after the fix, CPU usage returns to idle levels after MCP requests complete
Existing error handling catches the TimeoutError from fail_after() gracefully

mattf

please provide reproduction steps.

i did the following and still see 100% CPU usage -

10:53:24 in llama-stack on  fix/mcp-cpu-spike-timeout [$?] is 📦 0.4.0.dev0 …
➜ uv run llama stack run --providers agents=inline::meta-reference,inference=remote::llama-openai-compat,vector_io=inline::faiss,tool_runtime=inline::rag-runtime,files=inline::localfs
...
INFO     2026-01-28 10:53:34,588 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321      
         (Press CTRL+C to quit)                                                                                         
INFO     2026-01-28 10:53:38,379 uvicorn.access:476 uncategorized: ::1:53190 - "POST /v1/responses HTTP/1.1" 200

10:53:35 in llama-stack on  fix/mcp-cpu-spike-timeout [$?] is 📦 0.4.0.dev0 …
➜ curl http://localhost:8321/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-openai-compat/Llama-4-Scout-17B-16E-Instruct-FP8",
    "input": "Use the provided tool to say something.",
    "tools": [
      {
        "type": "mcp",
        "server_label": "local-mcp",
        "server_url": "http://localhost:9090"
      }
    ],
    "tool_choice": "auto"
  }'

derekhiggins · 2026-01-28T16:22:23Z

Also still seeing a problem
running https://github.com/derekhiggins/rhoai-auth-demo/blob/main/scripts/interactive-demo.py
python scripts/interactive-demo.py --user admin --tests mcp

proposed alternative change

derekhiggins · 2026-01-29T10:49:12Z

lgtm, CPU spike gone can when using MCP
thanks both.

mergify · 2026-01-30T01:42:12Z

This pull request has merge conflicts that must be resolved before it can be merged. @jwm4 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

jwm4 · 2026-02-02T17:48:43Z

@Mergifyio refresh

mergify · 2026-02-02T17:49:20Z

refresh

✅ Pull request refreshed

When making MCP calls through the responses API, the llama-stack server CPU usage could spike to 100% and remain there indefinitely due to anyio's _deliver_cancellation loop hanging during session cleanup. This fix adds a configurable timeout (default 5 seconds) to the __aexit__ calls in MCPSessionManager.close_all() using anyio.fail_after(). If cleanup takes longer than the timeout, it's aborted to prevent the CPU spin. Fixes llamastack#4754

This reverts commit 32f337a.

cdoern · 2026-02-02T18:51:33Z

@jwm4 is the revert + new commit here purposeful? if so can we update the title and description of the PR accordingly. thanks!

jwm4 · 2026-02-02T19:35:26Z

@cdoern , good catch -- the solution drifted over the course of the PR so I updated the title to reflect how the final version works.

jwm4 requested review from ashwinb, bbrowning, cdoern, ehhuang, franciscojavierarceo, leseb, mattf and raghotham as code owners January 28, 2026 14:22

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 28, 2026

jwm4 mentioned this pull request Jan 28, 2026

MCP CPU Spike #4754

Closed

jwm4 changed the title ~~Fix MCP CPU spike by adding timeout to session cleanup~~ fix: MCP CPU spike by adding timeout to session cleanup Jan 28, 2026

mattf previously requested changes Jan 28, 2026

View reviewed changes

mattf mentioned this pull request Jan 29, 2026

Responses API with conversation_id set does not store conversation items in openai_conversations table #4718

Closed

2 tasks

mergify bot added the needs-rebase label Jan 30, 2026

jwm4 closed this Feb 2, 2026

jwm4 reopened this Feb 2, 2026

jwm4 and others added 3 commits February 2, 2026 13:00

Revert "Fix MCP CPU spike by adding timeout to session cleanup"

36e3064

This reverts commit 32f337a.

convert MCPSessionManager to use a context manager protocol

06e4957

jwm4 force-pushed the fix/mcp-cpu-spike-timeout branch from 4c836af to 06e4957 Compare February 2, 2026 18:01

mergify bot removed the needs-rebase label Feb 2, 2026

jwm4 changed the title ~~fix: MCP CPU spike by adding timeout to session cleanup~~ fix: MCP CPU spike by using context manager for session cleanup Feb 2, 2026

cdoern approved these changes Feb 2, 2026

View reviewed changes

chore: retrigger CI

a6178b2

cdoern requested a review from mattf February 3, 2026 01:38

franciscojavierarceo merged commit 180d8af into llamastack:main Feb 3, 2026
48 of 67 checks passed

derekhiggins mentioned this pull request Feb 6, 2026

fix: MCP CPU spike by using context manager for session cleanup #4851

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: MCP CPU spike by using context manager for session cleanup#4758

fix: MCP CPU spike by using context manager for session cleanup#4758
franciscojavierarceo merged 4 commits intollamastack:mainfrom
jwm4:fix/mcp-cpu-spike-timeout

jwm4 commented Jan 28, 2026

Uh oh!

mattf left a comment

Uh oh!

derekhiggins commented Jan 28, 2026

Uh oh!

derekhiggins commented Jan 29, 2026

Uh oh!

mergify bot commented Jan 30, 2026

Uh oh!

jwm4 commented Feb 2, 2026

Uh oh!

mergify bot commented Feb 2, 2026

Uh oh!

cdoern commented Feb 2, 2026

Uh oh!

jwm4 commented Feb 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jwm4 commented Jan 28, 2026

Summary

Root Cause

Solution

Testing

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

derekhiggins commented Jan 28, 2026

Uh oh!

derekhiggins commented Jan 29, 2026

Uh oh!

mergify bot commented Jan 30, 2026

Uh oh!

jwm4 commented Feb 2, 2026

Uh oh!

mergify bot commented Feb 2, 2026

✅ Pull request refreshed

Uh oh!

cdoern commented Feb 2, 2026

Uh oh!

jwm4 commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jwm4 commented Feb 2, 2026 •

edited

Loading