Add best practices when using load balancer #325

jizhuozhi · 2025-04-12T07:57:17Z

Motivation and Context

When the MCP Server provides services with multiple replicas (especially as an external provider), there is usually a load balancer. Usually, unless a persistent connection is used, the load balancer assumes that the request is stateless. However, the MCP protocol is a stateful protocol. When using HTTP (SSE) communication, it is necessary to ensure that the instance that receives the message request is the instance that establishes the SSE connection.

Therefore, as a best practice, when using a load balancer, session stickiness must be guaranteed, and when the instance that establishes the SSE connection does not exist, the request must be rejected (which means that consistent hashing is not suitable in this scenario).

In addition, since common session stickiness is implemented based on cookies, cookie records may be necessary for MCP Clients.

How Has This Been Tested?

As above, for the MCP Server provides services with multiple replicas, using openresty as the loadbalancer to compare Round Robin and Cookie-based Session Stickiness, and Transport with HTTP Client which supports cookie jar. Round Robin couldn't work when multi-sessions, but Cookie-based Session Stickiness could working well.

Breaking Changes

None

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Checklist

I have read the MCP Documentation
My code follows the repository's style guidelines
New and existing tests pass locally
I have added appropriate error handling
I have added or updated documentation as needed

Additional context

Joffref · 2025-04-12T16:40:31Z

I wouldn’t list that as required; I’d rather present it as the easiest path. Session management is ultimately at the discretion of whoever implements the MCP server. Using Redis to store sessions and share it across instances can also solve the problem, but it introduces a more complex architecture.

jizhuozhi · 2025-04-12T16:58:31Z

Using Redis to store sessions and share it across instances can also solve the problem, but it introduces a more complex architecture.

The session management mentioned here is not simply sharing some data. According to the MCP protocol definition, a complete session includes three handshakes (the client sends a request to the SSE endpoint, the server returns the message endpoint information through SSE, and the client sends a notification to the message endpoint) to establish a channel (the client through the message endpoint, and the server through SSE), and this channel is what we call a session.

The problem now is that if the load balancer sends the message request to a server that has no SSE connection (that is, it cannot recognize the sessionId), then this request will fail.

Joffref · 2025-04-12T17:06:29Z

According to the current spec: https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#session-management, the session ID is returned during the init phase. This implies that the ID is not negotiated anywhere else in the protocol. Unless I’m missing something?

jizhuozhi · 2025-04-12T17:12:43Z

According to the current spec: https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#session-management, the session ID is returned during the init phase. This implies that the ID is not negotiated anywhere else in the protocol. Unless I’m missing something?

Yes, so this session stickiness is required means, load balancer which claims to support MCP would be best to support it without MCP sessionId through parsing SSE event, likes sticky cookie in nginx.

Joffref · 2025-04-12T17:18:08Z

I agree with you on LB/Gateway — ultimately, it's not required, just recommended. That said, the solution I mentioned earlier works fine as well. Where I’ve landed with this discussion is that we need an MCP Gateway specification, and it should be defined here.

jizhuozhi · 2025-04-12T17:25:55Z

Where I’ve landed with this discussion is that we need an MCP Gateway specification, and it should be defined here.

Yes, we need a specification to guide how load balancers or gateways support MCP. I have seen many different implementations so far, and their common point is that they all use the gateway as the MCP Server and communicate with business services through Redis PUB/SUB or other event buses or message middleware, which is not friendly to the standard uniform ecosystem.

raphaelkieling · 2025-04-12T19:00:07Z

We have the same problem here, the redis pub/sub is doable, but it's too much to connect an existing application that is running through 10/20 pods ONLY to support the mcp protocol. For now, we resolved making a single pod to handle mcp connections, commonly, this is more than enough to support internal chat calls.

Also, it's good to mention that HTTP stateless proposal is coming, and hopefully will make it a lot more easier:
#102

evalstate · 2025-05-02T16:04:41Z

This is a really good discussion - do you think it would be a good idea to collaborate on this within the Hosting Working Group to publish the best practices for the community here?:

https://github.com/modelcontextprotocol-community/working-groups

jizhuozhi · 2025-05-04T12:30:21Z

This is a really good discussion - do you think it would be a good idea to collaborate on this within the Hosting Working Group to publish the best practices for the community here?:

https://github.com/modelcontextprotocol-community/working-groups

I will see it latter after my holiday :)

youxihu · 2025-06-27T09:05:09Z

so
i use nginx. HOW to unstream my mcp services? like some conf show me ?

jizhuozhi · 2025-06-30T08:46:45Z

so i use nginx. HOW to unstream my mcp services? like some conf show me ?

A working solution is using sticky cookie https://nginx.org/en/docs/http/ngx_http_upstream_module.html#sticky_cookie to make sure client persist the selected instance.

As special remainder: client should support cookiejar to persist the information between requests

jonathanhefner · 2025-09-24T16:04:17Z

I agree with what @Joffref said:

I wouldn’t list that as required; I’d rather present it as the easiest path. Session management is ultimately at the discretion of whoever implements the MCP server. Using Redis to store sessions and share it across instances can also solve the problem, but it introduces a more complex architecture.

So I will close, but thank you for the pull request! 😃

@jizhuozhi

…g in MCP Deployments - **Status**: Draft - **Type**: Informational - **Created**: 2025-12-21 - **Author(s)**: Zhuozhi Ji <jizhuozhi.george@gmail.com> (@jizhuozhi) - **Sponsor**: None - **PR**: https://github.com/modelcontextprotocol/specification/pull/0000 ## Abstract This SEP proposes optional high availability (HA) best practices for MCP deployments with stateful streaming sessions (e.g., SSE). While the MCP protocol itself remains unchanged, production deployments often face challenges in maintaining session continuity and resilience when using multiple replicas behind load balancers. This proposal outlines optional patterns, including pub-sub event buses, cluster coordination with P2P forwarding, middleware/SDK abstraction, and session partitioning. These patterns provide guidance for implementers to achieve HA without breaking protocol compatibility or requiring client modifications. ## Motivation Production MCP deployments increasingly target multi-node, horizontally scalable environments. Long-lived streaming sessions (SSE) introduce challenges when routed through stateless HTTP ingress or load balancers: - Session continuity may break if connections are routed to a different replica. - Node failure or restart can interrupt ongoing streaming sessions. - Resuming sessions across replicas is non-trivial without coordination. Community discussions, including [GitHub PR modelcontextprotocol#325](modelcontextprotocol#325), have highlighted these issues. Contributors concluded that session stickiness or shared session stores are practical implementation considerations, but not mandated by the protocol. This creates an opportunity for **informational guidance** on HA patterns that are optional and non-intrusive. ## Specification This SEP does not introduce protocol-level changes. The following optional HA patterns are proposed for implementers: ### 1. Core HA Patterns #### 1.1 Event Bus / Pub-Sub - Externalize session events to a distributed pub-sub system. - MCP replicas subscribe to session events to enable failover and session recovery. - Decouples session lifetime from any single node. #### 1.2 Cluster Coordination & P2P Forwarding - MCP nodes maintain lightweight cluster state via gossip, shared stores, or JDBC ping. - Session messages can be forwarded to the node currently handling the session. - Avoids heavy consensus mechanisms to preserve throughput. ### 2. Implementation & Optimization Support #### 2.1 Middleware / SDK Abstraction - Encapsulates HA logic (pub-sub, P2P forwarding) within SDK or middleware. - Keeps protocol handlers and business logic unchanged. - Provides a transparent API to clients, allowing gradual adoption. #### 2.2 Session Partitioning / Affinity Hints - Session IDs may encode partitioning or affinity hints. - Reduces coordination overhead. - Affinity is advisory and must not impact correctness. ### 3. Illustrative Middleware-Oriented Model (Python, Non-Normative) ```python async def handle_mcp_message(message, send): if message["type"] == "tool_call": result = await run_tool(message["payload"]) await send({ "type": "tool_result", "payload": result }) class MCPHAMiddleware: def __init__(self, ha_backend): self.ha = ha_backend def wrap(self, handler): async def wrapped(message, send): session_id = self.ha.ensure_session(message) async with self.ha.bind_session(session_id, send) as ha_send: await handler(message, ha_send) return wrapped ```` ## Rationale * **Alternate designs considered**: Sticky sessions at load balancer, full Raft replication, central shared state. * **Why chosen approach**: Optional patterns allow HA without protocol changes, preserve throughput, and provide flexibility. * **Related work**: Community PR modelcontextprotocol#325; common HA patterns in distributed systems. * **Community consensus**: PR discussion supports optional, non-normative guidance for HA. ## Backward Compatibility No protocol changes are introduced. Existing clients and servers remain fully compatible. Adoption of HA patterns is optional and implementation-defined. ## Security Implications No new security surfaces are introduced by this SEP. Implementers should consider standard security practices for distributed coordination, pub-sub, and session forwarding. ## Reference Implementation * Prototype Python middleware shown above. * No full reference implementation is required to mark SEP as draft. ## Additional Optional Sections ### Performance Implications * Optional HA patterns may introduce additional latency or coordination overhead, but throughput is preserved by avoiding heavy consensus. ### Testing Plan * Implementers should validate session continuity during failover, replica restart, and load balancer routing. ### Alternatives Considered * Sticky sessions at LB (less flexible, not always feasible) * Full Raft replication (high latency, throughput penalty) * Central shared store (adds infrastructure complexity) ### Open Questions * Best practices for large clusters with thousands of concurrent streaming sessions. * Integration guidance for Streamable HTTP once adoption increases. ### Acknowledgments * Community contributors to PR modelcontextprotocol#325 for highlighting HA challenges in production MCP deployments.

@jizhuozhi

…g in MCP Deployments - **Status**: Draft - **Type**: Informational - **Created**: 2025-12-21 - **Author(s)**: Zhuozhi Ji <jizhuozhi.george@gmail.com> (@jizhuozhi) - **Sponsor**: None - **PR**: https://github.com/modelcontextprotocol/specification/pull/0000 ## Abstract This SEP proposes optional high availability (HA) best practices for MCP deployments with stateful streaming sessions (e.g., SSE). While the MCP protocol itself remains unchanged, production deployments often face challenges in maintaining session continuity and resilience when using multiple replicas behind load balancers. This proposal outlines optional patterns, including pub-sub event buses, cluster coordination with P2P forwarding, middleware/SDK abstraction, and session partitioning. These patterns provide guidance for implementers to achieve HA without breaking protocol compatibility or requiring client modifications. ## Motivation Production MCP deployments increasingly target multi-node, horizontally scalable environments. Long-lived streaming sessions (SSE) introduce challenges when routed through stateless HTTP ingress or load balancers: - Session continuity may break if connections are routed to a different replica. - Node failure or restart can interrupt ongoing streaming sessions. - Resuming sessions across replicas is non-trivial without coordination. Community discussions, including [GitHub PR modelcontextprotocol#325](modelcontextprotocol#325), have highlighted these issues. Contributors concluded that session stickiness or shared session stores are practical implementation considerations, but not mandated by the protocol. This creates an opportunity for **informational guidance** on HA patterns that are optional and non-intrusive. ## Specification This SEP does not introduce protocol-level changes. The following optional HA patterns are proposed for implementers: ### 1. Core HA Patterns #### 1.1 Event Bus / Pub-Sub - Externalize session events to a distributed pub-sub system. - MCP replicas subscribe to session events to enable failover and session recovery. - Decouples session lifetime from any single node. #### 1.2 Cluster Coordination & P2P Forwarding - MCP nodes maintain lightweight cluster state via gossip, shared stores, or JDBC ping. - Session messages can be forwarded to the node currently handling the session. - Avoids heavy consensus mechanisms to preserve throughput. ### 2. Implementation & Optimization Support #### 2.1 Middleware / SDK Abstraction - Encapsulates HA logic (pub-sub, P2P forwarding) within SDK or middleware. - Keeps protocol handlers and business logic unchanged. - Provides a transparent API to clients, allowing gradual adoption. #### 2.2 Session Partitioning / Affinity Hints - Session IDs may encode partitioning or affinity hints. - Reduces coordination overhead. - Affinity is advisory and must not impact correctness. ### 3. Illustrative Middleware-Oriented Model (Python, Non-Normative) ```python async def handle_mcp_message(message, send): if message["type"] == "tool_call": result = await run_tool(message["payload"]) await send({ "type": "tool_result", "payload": result }) class MCPHAMiddleware: def __init__(self, ha_backend): self.ha = ha_backend def wrap(self, handler): async def wrapped(message, send): session_id = self.ha.ensure_session(message) async with self.ha.bind_session(session_id, send) as ha_send: await handler(message, ha_send) return wrapped ```` ## Rationale * **Alternate designs considered**: Sticky sessions at load balancer, full Raft replication, central shared state. * **Why chosen approach**: Optional patterns allow HA without protocol changes, preserve throughput, and provide flexibility. * **Related work**: Community PR modelcontextprotocol#325; common HA patterns in distributed systems. * **Community consensus**: PR discussion supports optional, non-normative guidance for HA. ## Backward Compatibility No protocol changes are introduced. Existing clients and servers remain fully compatible. Adoption of HA patterns is optional and implementation-defined. ## Security Implications No new security surfaces are introduced by this SEP. Implementers should consider standard security practices for distributed coordination, pub-sub, and session forwarding. ## Reference Implementation * Prototype Python middleware shown above. * No full reference implementation is required to mark SEP as draft. ## Additional Optional Sections ### Performance Implications * Optional HA patterns may introduce additional latency or coordination overhead, but throughput is preserved by avoiding heavy consensus. ### Testing Plan * Implementers should validate session continuity during failover, replica restart, and load balancer routing. ### Alternatives Considered * Sticky sessions at LB (less flexible, not always feasible) * Full Raft replication (high latency, throughput penalty) * Central shared store (adds infrastructure complexity) ### Open Questions * Best practices for large clusters with thousands of concurrent streaming sessions. * Integration guidance for Streamable HTTP once adoption increases. ### Acknowledgments * Community contributors to PR modelcontextprotocol#325 for highlighting HA challenges in production MCP deployments.

Add best practices when using load balancer

bb32765

Joffref mentioned this pull request Jul 10, 2025

SEP-001: Introduce Governance Model and SEP Process #931

Merged

dsp-ant requested review from a team and jonathanhefner September 23, 2025 21:10

jonathanhefner closed this Sep 24, 2025

jizhuozhi mentioned this pull request Dec 20, 2025

Proposal: Optional High Availability Best Practices for MCP Deployments with Stateful Streaming (SSE) Connections #2000

Open

jizhuozhi mentioned this pull request Dec 20, 2025

SEP-2001: Optional High Availability Patterns for Stateful Streaming in MCP Deployments #2001

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add best practices when using load balancer #325

Add best practices when using load balancer #325

jizhuozhi commented Apr 12, 2025

Uh oh!

Joffref commented Apr 12, 2025

Uh oh!

jizhuozhi commented Apr 12, 2025

Uh oh!

Joffref commented Apr 12, 2025

Uh oh!

jizhuozhi commented Apr 12, 2025 •

edited

Loading

Uh oh!

Joffref commented Apr 12, 2025

Uh oh!

jizhuozhi commented Apr 12, 2025

Uh oh!

raphaelkieling commented Apr 12, 2025

Uh oh!

evalstate commented May 2, 2025

Uh oh!

jizhuozhi commented May 4, 2025

Uh oh!

youxihu commented Jun 27, 2025

Uh oh!

jizhuozhi commented Jun 30, 2025

Uh oh!

jonathanhefner commented Sep 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Add best practices when using load balancer #325

Add best practices when using load balancer #325

Conversation

jizhuozhi commented Apr 12, 2025

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

Checklist

Additional context

Uh oh!

Joffref commented Apr 12, 2025

Uh oh!

jizhuozhi commented Apr 12, 2025

Uh oh!

Joffref commented Apr 12, 2025

Uh oh!

jizhuozhi commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Joffref commented Apr 12, 2025

Uh oh!

jizhuozhi commented Apr 12, 2025

Uh oh!

raphaelkieling commented Apr 12, 2025

Uh oh!

evalstate commented May 2, 2025

Uh oh!

jizhuozhi commented May 4, 2025

Uh oh!

youxihu commented Jun 27, 2025

Uh oh!

jizhuozhi commented Jun 30, 2025

Uh oh!

jonathanhefner commented Sep 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jizhuozhi commented Apr 12, 2025 •

edited

Loading