-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add best practices when using load balancer #325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I wouldn’t list that as required; I’d rather present it as the easiest path. Session management is ultimately at the discretion of whoever implements the MCP server. Using Redis to store sessions and share it across instances can also solve the problem, but it introduces a more complex architecture. |
The session management mentioned here is not simply sharing some data. According to the MCP protocol definition, a complete session includes three handshakes (the client sends a request to the SSE endpoint, the server returns the message endpoint information through SSE, and the client sends a notification to the message endpoint) to establish a channel (the client through the message endpoint, and the server through SSE), and this channel is what we call a session. The problem now is that if the load balancer sends the message request to a server that has no SSE connection (that is, it cannot recognize the sessionId), then this request will fail. |
|
According to the current spec: https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#session-management, the session ID is returned during the init phase. This implies that the ID is not negotiated anywhere else in the protocol. Unless I’m missing something? |
Yes, so this session stickiness is required means, load balancer which claims to support MCP would be best to support it without MCP sessionId through parsing SSE event, likes sticky cookie in nginx. |
|
I agree with you on LB/Gateway — ultimately, it's not required, just recommended. That said, the solution I mentioned earlier works fine as well. Where I’ve landed with this discussion is that we need an MCP Gateway specification, and it should be defined here. |
Yes, we need a specification to guide how load balancers or gateways support MCP. I have seen many different implementations so far, and their common point is that they all use the gateway as the MCP Server and communicate with business services through Redis PUB/SUB or other event buses or message middleware, which is not friendly to the standard uniform ecosystem. |
|
We have the same problem here, the redis pub/sub is doable, but it's too much to connect an existing application that is running through 10/20 pods ONLY to support the mcp protocol. For now, we resolved making a single pod to handle mcp connections, commonly, this is more than enough to support internal chat calls. Also, it's good to mention that HTTP stateless proposal is coming, and hopefully will make it a lot more easier: |
|
This is a really good discussion - do you think it would be a good idea to collaborate on this within the Hosting Working Group to publish the best practices for the community here?: https://github.com/modelcontextprotocol-community/working-groups |
I will see it latter after my holiday :) |
|
so |
A working solution is using sticky cookie https://nginx.org/en/docs/http/ngx_http_upstream_module.html#sticky_cookie to make sure client persist the selected instance. As special remainder: client should support cookiejar to persist the information between requests |
|
I agree with what @Joffref said:
So I will close, but thank you for the pull request! 😃 |
…g in MCP Deployments - **Status**: Draft - **Type**: Informational - **Created**: 2025-12-21 - **Author(s)**: Zhuozhi Ji <jizhuozhi.george@gmail.com> (@jizhuozhi) - **Sponsor**: None - **PR**: https://github.com/modelcontextprotocol/specification/pull/0000 ## Abstract This SEP proposes optional high availability (HA) best practices for MCP deployments with stateful streaming sessions (e.g., SSE). While the MCP protocol itself remains unchanged, production deployments often face challenges in maintaining session continuity and resilience when using multiple replicas behind load balancers. This proposal outlines optional patterns, including pub-sub event buses, cluster coordination with P2P forwarding, middleware/SDK abstraction, and session partitioning. These patterns provide guidance for implementers to achieve HA without breaking protocol compatibility or requiring client modifications. ## Motivation Production MCP deployments increasingly target multi-node, horizontally scalable environments. Long-lived streaming sessions (SSE) introduce challenges when routed through stateless HTTP ingress or load balancers: - Session continuity may break if connections are routed to a different replica. - Node failure or restart can interrupt ongoing streaming sessions. - Resuming sessions across replicas is non-trivial without coordination. Community discussions, including [GitHub PR modelcontextprotocol#325](modelcontextprotocol#325), have highlighted these issues. Contributors concluded that session stickiness or shared session stores are practical implementation considerations, but not mandated by the protocol. This creates an opportunity for **informational guidance** on HA patterns that are optional and non-intrusive. ## Specification This SEP does not introduce protocol-level changes. The following optional HA patterns are proposed for implementers: ### 1. Core HA Patterns #### 1.1 Event Bus / Pub-Sub - Externalize session events to a distributed pub-sub system. - MCP replicas subscribe to session events to enable failover and session recovery. - Decouples session lifetime from any single node. #### 1.2 Cluster Coordination & P2P Forwarding - MCP nodes maintain lightweight cluster state via gossip, shared stores, or JDBC ping. - Session messages can be forwarded to the node currently handling the session. - Avoids heavy consensus mechanisms to preserve throughput. ### 2. Implementation & Optimization Support #### 2.1 Middleware / SDK Abstraction - Encapsulates HA logic (pub-sub, P2P forwarding) within SDK or middleware. - Keeps protocol handlers and business logic unchanged. - Provides a transparent API to clients, allowing gradual adoption. #### 2.2 Session Partitioning / Affinity Hints - Session IDs may encode partitioning or affinity hints. - Reduces coordination overhead. - Affinity is advisory and must not impact correctness. ### 3. Illustrative Middleware-Oriented Model (Python, Non-Normative) ```python async def handle_mcp_message(message, send): if message["type"] == "tool_call": result = await run_tool(message["payload"]) await send({ "type": "tool_result", "payload": result }) class MCPHAMiddleware: def __init__(self, ha_backend): self.ha = ha_backend def wrap(self, handler): async def wrapped(message, send): session_id = self.ha.ensure_session(message) async with self.ha.bind_session(session_id, send) as ha_send: await handler(message, ha_send) return wrapped ```` ## Rationale * **Alternate designs considered**: Sticky sessions at load balancer, full Raft replication, central shared state. * **Why chosen approach**: Optional patterns allow HA without protocol changes, preserve throughput, and provide flexibility. * **Related work**: Community PR modelcontextprotocol#325; common HA patterns in distributed systems. * **Community consensus**: PR discussion supports optional, non-normative guidance for HA. ## Backward Compatibility No protocol changes are introduced. Existing clients and servers remain fully compatible. Adoption of HA patterns is optional and implementation-defined. ## Security Implications No new security surfaces are introduced by this SEP. Implementers should consider standard security practices for distributed coordination, pub-sub, and session forwarding. ## Reference Implementation * Prototype Python middleware shown above. * No full reference implementation is required to mark SEP as draft. ## Additional Optional Sections ### Performance Implications * Optional HA patterns may introduce additional latency or coordination overhead, but throughput is preserved by avoiding heavy consensus. ### Testing Plan * Implementers should validate session continuity during failover, replica restart, and load balancer routing. ### Alternatives Considered * Sticky sessions at LB (less flexible, not always feasible) * Full Raft replication (high latency, throughput penalty) * Central shared store (adds infrastructure complexity) ### Open Questions * Best practices for large clusters with thousands of concurrent streaming sessions. * Integration guidance for Streamable HTTP once adoption increases. ### Acknowledgments * Community contributors to PR modelcontextprotocol#325 for highlighting HA challenges in production MCP deployments.
…g in MCP Deployments - **Status**: Draft - **Type**: Informational - **Created**: 2025-12-21 - **Author(s)**: Zhuozhi Ji <jizhuozhi.george@gmail.com> (@jizhuozhi) - **Sponsor**: None - **PR**: https://github.com/modelcontextprotocol/specification/pull/0000 ## Abstract This SEP proposes optional high availability (HA) best practices for MCP deployments with stateful streaming sessions (e.g., SSE). While the MCP protocol itself remains unchanged, production deployments often face challenges in maintaining session continuity and resilience when using multiple replicas behind load balancers. This proposal outlines optional patterns, including pub-sub event buses, cluster coordination with P2P forwarding, middleware/SDK abstraction, and session partitioning. These patterns provide guidance for implementers to achieve HA without breaking protocol compatibility or requiring client modifications. ## Motivation Production MCP deployments increasingly target multi-node, horizontally scalable environments. Long-lived streaming sessions (SSE) introduce challenges when routed through stateless HTTP ingress or load balancers: - Session continuity may break if connections are routed to a different replica. - Node failure or restart can interrupt ongoing streaming sessions. - Resuming sessions across replicas is non-trivial without coordination. Community discussions, including [GitHub PR modelcontextprotocol#325](modelcontextprotocol#325), have highlighted these issues. Contributors concluded that session stickiness or shared session stores are practical implementation considerations, but not mandated by the protocol. This creates an opportunity for **informational guidance** on HA patterns that are optional and non-intrusive. ## Specification This SEP does not introduce protocol-level changes. The following optional HA patterns are proposed for implementers: ### 1. Core HA Patterns #### 1.1 Event Bus / Pub-Sub - Externalize session events to a distributed pub-sub system. - MCP replicas subscribe to session events to enable failover and session recovery. - Decouples session lifetime from any single node. #### 1.2 Cluster Coordination & P2P Forwarding - MCP nodes maintain lightweight cluster state via gossip, shared stores, or JDBC ping. - Session messages can be forwarded to the node currently handling the session. - Avoids heavy consensus mechanisms to preserve throughput. ### 2. Implementation & Optimization Support #### 2.1 Middleware / SDK Abstraction - Encapsulates HA logic (pub-sub, P2P forwarding) within SDK or middleware. - Keeps protocol handlers and business logic unchanged. - Provides a transparent API to clients, allowing gradual adoption. #### 2.2 Session Partitioning / Affinity Hints - Session IDs may encode partitioning or affinity hints. - Reduces coordination overhead. - Affinity is advisory and must not impact correctness. ### 3. Illustrative Middleware-Oriented Model (Python, Non-Normative) ```python async def handle_mcp_message(message, send): if message["type"] == "tool_call": result = await run_tool(message["payload"]) await send({ "type": "tool_result", "payload": result }) class MCPHAMiddleware: def __init__(self, ha_backend): self.ha = ha_backend def wrap(self, handler): async def wrapped(message, send): session_id = self.ha.ensure_session(message) async with self.ha.bind_session(session_id, send) as ha_send: await handler(message, ha_send) return wrapped ```` ## Rationale * **Alternate designs considered**: Sticky sessions at load balancer, full Raft replication, central shared state. * **Why chosen approach**: Optional patterns allow HA without protocol changes, preserve throughput, and provide flexibility. * **Related work**: Community PR modelcontextprotocol#325; common HA patterns in distributed systems. * **Community consensus**: PR discussion supports optional, non-normative guidance for HA. ## Backward Compatibility No protocol changes are introduced. Existing clients and servers remain fully compatible. Adoption of HA patterns is optional and implementation-defined. ## Security Implications No new security surfaces are introduced by this SEP. Implementers should consider standard security practices for distributed coordination, pub-sub, and session forwarding. ## Reference Implementation * Prototype Python middleware shown above. * No full reference implementation is required to mark SEP as draft. ## Additional Optional Sections ### Performance Implications * Optional HA patterns may introduce additional latency or coordination overhead, but throughput is preserved by avoiding heavy consensus. ### Testing Plan * Implementers should validate session continuity during failover, replica restart, and load balancer routing. ### Alternatives Considered * Sticky sessions at LB (less flexible, not always feasible) * Full Raft replication (high latency, throughput penalty) * Central shared store (adds infrastructure complexity) ### Open Questions * Best practices for large clusters with thousands of concurrent streaming sessions. * Integration guidance for Streamable HTTP once adoption increases. ### Acknowledgments * Community contributors to PR modelcontextprotocol#325 for highlighting HA challenges in production MCP deployments.
Motivation and Context
When the MCP Server provides services with multiple replicas (especially as an external provider), there is usually a load balancer. Usually, unless a persistent connection is used, the load balancer assumes that the request is stateless. However, the MCP protocol is a stateful protocol. When using HTTP (SSE) communication, it is necessary to ensure that the instance that receives the message request is the instance that establishes the SSE connection.
Therefore, as a best practice, when using a load balancer, session stickiness must be guaranteed, and when the instance that establishes the SSE connection does not exist, the request must be rejected (which means that consistent hashing is not suitable in this scenario).
In addition, since common session stickiness is implemented based on cookies, cookie records may be necessary for MCP Clients.
How Has This Been Tested?
As above, for the MCP Server provides services with multiple replicas, using openresty as the loadbalancer to compare Round Robin and Cookie-based Session Stickiness, and Transport with HTTP Client which supports cookie jar. Round Robin couldn't work when multi-sessions, but Cookie-based Session Stickiness could working well.
Breaking Changes
None
Types of changes
Checklist
Additional context