fix: MCP CPU spike by using context manager for session cleanup#4758
fix: MCP CPU spike by using context manager for session cleanup#4758franciscojavierarceo merged 4 commits intollamastack:mainfrom
Conversation
mattf
left a comment
There was a problem hiding this comment.
please provide reproduction steps.
i did the following and still see 100% CPU usage -
10:53:24 in llama-stack on fix/mcp-cpu-spike-timeout [$?] is 📦 0.4.0.dev0 …
➜ uv run llama stack run --providers agents=inline::meta-reference,inference=remote::llama-openai-compat,vector_io=inline::faiss,tool_runtime=inline::rag-runtime,files=inline::localfs
...
INFO 2026-01-28 10:53:34,588 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321
(Press CTRL+C to quit)
INFO 2026-01-28 10:53:38,379 uvicorn.access:476 uncategorized: ::1:53190 - "POST /v1/responses HTTP/1.1" 200
10:53:35 in llama-stack on fix/mcp-cpu-spike-timeout [$?] is 📦 0.4.0.dev0 …
➜ curl http://localhost:8321/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "llama-openai-compat/Llama-4-Scout-17B-16E-Instruct-FP8",
"input": "Use the provided tool to say something.",
"tools": [
{
"type": "mcp",
"server_label": "local-mcp",
"server_url": "http://localhost:9090"
}
],
"tool_choice": "auto"
}'
|
Also still seeing a problem |
|
lgtm, CPU spike gone can when using MCP |
|
This pull request has merge conflicts that must be resolved before it can be merged. @jwm4 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
|
@Mergifyio refresh |
✅ Pull request refreshed |
When making MCP calls through the responses API, the llama-stack server CPU usage could spike to 100% and remain there indefinitely due to anyio's _deliver_cancellation loop hanging during session cleanup. This fix adds a configurable timeout (default 5 seconds) to the __aexit__ calls in MCPSessionManager.close_all() using anyio.fail_after(). If cleanup takes longer than the timeout, it's aborted to prevent the CPU spin. Fixes llamastack#4754
This reverts commit 32f337a.
4c836af to
06e4957
Compare
|
@jwm4 is the revert + new commit here purposeful? if so can we update the title and description of the PR accordingly. thanks! |
|
@cdoern , good catch -- the solution drifted over the course of the PR so I updated the title to reflect how the final version works. |
Summary
Fixes #4754
When making MCP calls through the responses API, the llama-stack server CPU usage could spike to 100% and remain there indefinitely, even after the request completes.
Root Cause
The issue occurs during MCP session cleanup in
MCPSessionManager.close_all(). When tasks don't respond to cancellation, anyio's_deliver_cancellationloop can spin indefinitely, causing the CPU spike.Solution
Added a configurable timeout (default 5 seconds) to the
__aexit__calls usinganyio.fail_after(). If cleanup takes longer than the timeout, it's aborted to prevent the CPU spin.Testing
TimeoutErrorfromfail_after()gracefully