Skip to content

fix: MCP CPU spike by using context manager for session cleanup#4758

Merged
franciscojavierarceo merged 4 commits intollamastack:mainfrom
jwm4:fix/mcp-cpu-spike-timeout
Feb 3, 2026
Merged

fix: MCP CPU spike by using context manager for session cleanup#4758
franciscojavierarceo merged 4 commits intollamastack:mainfrom
jwm4:fix/mcp-cpu-spike-timeout

Conversation

@jwm4
Copy link
Contributor

@jwm4 jwm4 commented Jan 28, 2026

Summary

Fixes #4754

When making MCP calls through the responses API, the llama-stack server CPU usage could spike to 100% and remain there indefinitely, even after the request completes.

Root Cause

The issue occurs during MCP session cleanup in MCPSessionManager.close_all(). When tasks don't respond to cancellation, anyio's _deliver_cancellation loop can spin indefinitely, causing the CPU spike.

Solution

Added a configurable timeout (default 5 seconds) to the __aexit__ calls using anyio.fail_after(). If cleanup takes longer than the timeout, it's aborted to prevent the CPU spin.

Testing

  • Verified that after the fix, CPU usage returns to idle levels after MCP requests complete
  • Existing error handling catches the TimeoutError from fail_after() gracefully

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 28, 2026
@jwm4 jwm4 mentioned this pull request Jan 28, 2026
@jwm4 jwm4 changed the title Fix MCP CPU spike by adding timeout to session cleanup fix: MCP CPU spike by adding timeout to session cleanup Jan 28, 2026
mattf
mattf previously requested changes Jan 28, 2026
Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please provide reproduction steps.

i did the following and still see 100% CPU usage -

10:53:24 in llama-stack on  fix/mcp-cpu-spike-timeout [$?] is 📦 0.4.0.dev0 …
➜ uv run llama stack run --providers agents=inline::meta-reference,inference=remote::llama-openai-compat,vector_io=inline::faiss,tool_runtime=inline::rag-runtime,files=inline::localfs
...
INFO     2026-01-28 10:53:34,588 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321      
         (Press CTRL+C to quit)                                                                                         
INFO     2026-01-28 10:53:38,379 uvicorn.access:476 uncategorized: ::1:53190 - "POST /v1/responses HTTP/1.1" 200
10:53:35 in llama-stack on  fix/mcp-cpu-spike-timeout [$?] is 📦 0.4.0.dev0 …
➜ curl http://localhost:8321/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-openai-compat/Llama-4-Scout-17B-16E-Instruct-FP8",
    "input": "Use the provided tool to say something.",
    "tools": [
      {
        "type": "mcp",
        "server_label": "local-mcp",
        "server_url": "http://localhost:9090"
      }
    ],
    "tool_choice": "auto"
  }'

@derekhiggins
Copy link
Contributor

Also still seeing a problem
running https://github.com/derekhiggins/rhoai-auth-demo/blob/main/scripts/interactive-demo.py
python scripts/interactive-demo.py --user admin --tests mcp

@mattf mattf dismissed their stale review January 28, 2026 16:55

proposed alternative change

@derekhiggins
Copy link
Contributor

lgtm, CPU spike gone can when using MCP
thanks both.

@mergify
Copy link

mergify bot commented Jan 30, 2026

This pull request has merge conflicts that must be resolved before it can be merged. @jwm4 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 30, 2026
@jwm4
Copy link
Contributor Author

jwm4 commented Feb 2, 2026

@Mergifyio refresh

@mergify
Copy link

mergify bot commented Feb 2, 2026

refresh

✅ Pull request refreshed

@jwm4 jwm4 closed this Feb 2, 2026
@jwm4 jwm4 reopened this Feb 2, 2026
jwm4 and others added 3 commits February 2, 2026 13:00
When making MCP calls through the responses API, the llama-stack server
CPU usage could spike to 100% and remain there indefinitely due to
anyio's _deliver_cancellation loop hanging during session cleanup.

This fix adds a configurable timeout (default 5 seconds) to the
__aexit__ calls in MCPSessionManager.close_all() using anyio.fail_after().
If cleanup takes longer than the timeout, it's aborted to prevent the
CPU spin.

Fixes llamastack#4754
@jwm4 jwm4 force-pushed the fix/mcp-cpu-spike-timeout branch from 4c836af to 06e4957 Compare February 2, 2026 18:01
@mergify mergify bot removed the needs-rebase label Feb 2, 2026
@cdoern
Copy link
Collaborator

cdoern commented Feb 2, 2026

@jwm4 is the revert + new commit here purposeful? if so can we update the title and description of the PR accordingly. thanks!

@jwm4 jwm4 changed the title fix: MCP CPU spike by adding timeout to session cleanup fix: MCP CPU spike by using context manager for session cleanup Feb 2, 2026
@jwm4
Copy link
Contributor Author

jwm4 commented Feb 2, 2026

@cdoern , good catch -- the solution drifted over the course of the PR so I updated the title to reflect how the final version works.

@cdoern cdoern requested a review from mattf February 3, 2026 01:38
@franciscojavierarceo franciscojavierarceo merged commit 180d8af into llamastack:main Feb 3, 2026
48 of 67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP CPU Spike

5 participants