forked from modelcontextprotocol/python-sdk
-
Notifications
You must be signed in to change notification settings - Fork 0
Enable session roaming across multiple server instances #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
davidroberts-merlyn
wants to merge
10
commits into
main
Choose a base branch
from
session-roaming
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add session roaming support to StreamableHTTPSessionManager, allowing
sessions to move freely between server instances without requiring
sticky sessions. This enables true horizontal scaling and high
availability for stateful MCP servers.
When a request arrives with a session ID not found in local memory,
the presence of an EventStore allows creating a transport for that
session. EventStore serves dual purposes: storing events (existing)
and proving session existence (new). This eliminates the need for
separate session validation storage.
Changes:
- Add session roaming logic in _handle_stateful_request()
- Extract duplicate server task code into reusable methods
- Update docstrings to document session roaming capability
- Add 8 comprehensive tests for session roaming scenarios
- Add production-ready example with Redis EventStore
- Include Kubernetes and Docker Compose deployment examples
Benefits:
- One store instead of two (EventStore serves both purposes)
- No new APIs or interfaces required
- Minimal code changes (~50 lines in manager)
- 100% backward compatible
- Enables multi-instance deployments without sticky sessions
Example usage:
event_store = RedisEventStore(redis_url="redis://redis:6379")
manager = StreamableHTTPSessionManager(
app=app,
event_store=event_store # Enables session roaming
)
Github-Issue: modelcontextprotocol#520
Github-Issue: modelcontextprotocol#692
Github-Issue: modelcontextprotocol#880
Github-Issue: modelcontextprotocol#1350
Change single quotes to double quotes to comply with prettier formatting requirements.
- Add language specifiers to all code blocks - Fix heading hierarchy (bold text to proper headings) - Add blank lines after headings for better readability - Escape underscores in file paths (__init__.py -> **init**.py)
The transport could be removed from _server_instances by the cleanup task if it crashed immediately after being started. This caused a KeyError when trying to access it from the dictionary. Fixed by keeping a local reference to the transport instead of looking it up again from the dictionary after starting the server task.
Use @contextlib.asynccontextmanager decorator instead of manual __aenter__/__aexit__ implementation for mock_connect functions. Fixes test failures in: - test_transport_server_task_cleanup_on_exception - test_transport_server_task_no_cleanup_on_terminated
Add AsyncIterator import and use proper return type annotation for mock_connect functions: AsyncIterator[tuple[AsyncMock, AsyncMock]] instead of Any.
The tests were failing because AsyncMock(return_value=None) caused app.run to complete immediately, which closed the transport streams and triggered cleanup that removed transports from _server_instances before assertions could check for them. Now using mock_app_run that calls anyio.sleep_forever() and blocks until the test context cancels it. This keeps transports alive during the test assertions.
1c9b3ce to
ce114b2
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Enable Session Roaming Across Multiple Server Instances
Problem
When deploying MCP servers across multiple instances (Kubernetes pods, Docker containers, worker processes), sessions are tied to the specific instance that created them. This requires sticky sessions at the load balancer level and prevents true horizontal scaling. Users are currently forced to choose between:
This limitation is documented in multiple issues: modelcontextprotocol#520 (multi-worker sessions), modelcontextprotocol#692 (session reuse across instances), modelcontextprotocol#880 (horizontal scalability), and modelcontextprotocol#1350 (sticky session problems).
Solution
This PR enables session roaming - allowing sessions to seamlessly move between server instances without requiring sticky sessions. The key insight is that EventStore already serves as proof of session existence.
When a request arrives with a session ID that's not in an instance's local memory, if an EventStore is configured, the instance can safely:
What Changed
Modified
streamable_http_manager.py(~50 lines):_handle_stateful_request()Added comprehensive tests (
test_session_roaming.py, 510 lines):Added production-ready example (
simple-streamablehttp-roaming/, 13 files):Why This Approach
Previous Attempts
We explored two other approaches before arriving at this solution:
Custom Session Store (outside SDK) - Created our own session validation in the application layer, but this didn't solve the core problem and required every user to implement their own solution.
SessionStore ABC (in SDK) - Added a new
SessionStoreinterface requiring bothEventStore+SessionStoreparameters. While functional, this approach required two separate storage backends and was more complex than necessary.Current Approach: EventStore-Only
The key insight: EventStore already proves sessions existed. If events exist for a session ID, that session must have existed to create those events. No separate SessionStore needed.
Benefits:
Usage
Before (Requires Sticky Sessions)
After (No Sticky Sessions Needed)
How It Works
Testing
All tests pass, including:
Production Deployment
The included example demonstrates:
Breaking Changes
None. This is a pure behavior enhancement:
Related Issues
Closes modelcontextprotocol#520, modelcontextprotocol#692, modelcontextprotocol#880, modelcontextprotocol#1350
This implementation addresses the core limitation described in all these issues: the inability to run stateful MCP servers across multiple instances without sticky sessions.