Skip to content

Conversation

@jizhuozhi
Copy link

Motivation and Context

When the MCP Server provides services with multiple replicas (especially as an external provider), there is usually a load balancer. Usually, unless a persistent connection is used, the load balancer assumes that the request is stateless. However, the MCP protocol is a stateful protocol. When using HTTP (SSE) communication, it is necessary to ensure that the instance that receives the message request is the instance that establishes the SSE connection.

Therefore, as a best practice, when using a load balancer, session stickiness must be guaranteed, and when the instance that establishes the SSE connection does not exist, the request must be rejected (which means that consistent hashing is not suitable in this scenario).

In addition, since common session stickiness is implemented based on cookies, cookie records may be necessary for MCP Clients.

How Has This Been Tested?

As above, for the MCP Server provides services with multiple replicas, using openresty as the loadbalancer to compare Round Robin and Cookie-based Session Stickiness, and Transport with HTTP Client which supports cookie jar. Round Robin couldn't work when multi-sessions, but Cookie-based Session Stickiness could working well.

Breaking Changes

None

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

@Joffref
Copy link
Contributor

Joffref commented Apr 12, 2025

I wouldn’t list that as required; I’d rather present it as the easiest path. Session management is ultimately at the discretion of whoever implements the MCP server. Using Redis to store sessions and share it across instances can also solve the problem, but it introduces a more complex architecture.

@jizhuozhi
Copy link
Author

Using Redis to store sessions and share it across instances can also solve the problem, but it introduces a more complex architecture.

The session management mentioned here is not simply sharing some data. According to the MCP protocol definition, a complete session includes three handshakes (the client sends a request to the SSE endpoint, the server returns the message endpoint information through SSE, and the client sends a notification to the message endpoint) to establish a channel (the client through the message endpoint, and the server through SSE), and this channel is what we call a session.

The problem now is that if the load balancer sends the message request to a server that has no SSE connection (that is, it cannot recognize the sessionId), then this request will fail.

@Joffref
Copy link
Contributor

Joffref commented Apr 12, 2025

According to the current spec: https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#session-management, the session ID is returned during the init phase. This implies that the ID is not negotiated anywhere else in the protocol. Unless I’m missing something?

@jizhuozhi
Copy link
Author

jizhuozhi commented Apr 12, 2025

According to the current spec: https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#session-management, the session ID is returned during the init phase. This implies that the ID is not negotiated anywhere else in the protocol. Unless I’m missing something?

Yes, so this session stickiness is required means, load balancer which claims to support MCP would be best to support it without MCP sessionId through parsing SSE event, likes sticky cookie in nginx.

@Joffref
Copy link
Contributor

Joffref commented Apr 12, 2025

I agree with you on LB/Gateway — ultimately, it's not required, just recommended. That said, the solution I mentioned earlier works fine as well. Where I’ve landed with this discussion is that we need an MCP Gateway specification, and it should be defined here.

@jizhuozhi
Copy link
Author

Where I’ve landed with this discussion is that we need an MCP Gateway specification, and it should be defined here.

Yes, we need a specification to guide how load balancers or gateways support MCP. I have seen many different implementations so far, and their common point is that they all use the gateway as the MCP Server and communicate with business services through Redis PUB/SUB or other event buses or message middleware, which is not friendly to the standard uniform ecosystem.

@raphaelkieling
Copy link

We have the same problem here, the redis pub/sub is doable, but it's too much to connect an existing application that is running through 10/20 pods ONLY to support the mcp protocol. For now, we resolved making a single pod to handle mcp connections, commonly, this is more than enough to support internal chat calls.

Also, it's good to mention that HTTP stateless proposal is coming, and hopefully will make it a lot more easier:
#102

@evalstate
Copy link
Member

This is a really good discussion - do you think it would be a good idea to collaborate on this within the Hosting Working Group to publish the best practices for the community here?:

https://github.com/modelcontextprotocol-community/working-groups

@jizhuozhi
Copy link
Author

This is a really good discussion - do you think it would be a good idea to collaborate on this within the Hosting Working Group to publish the best practices for the community here?:

https://github.com/modelcontextprotocol-community/working-groups

I will see it latter after my holiday :)

@youxihu
Copy link

youxihu commented Jun 27, 2025

so
i use nginx. HOW to unstream my mcp services? like some conf show me ?

@jizhuozhi
Copy link
Author

so i use nginx. HOW to unstream my mcp services? like some conf show me ?

A working solution is using sticky cookie https://nginx.org/en/docs/http/ngx_http_upstream_module.html#sticky_cookie to make sure client persist the selected instance.

As special remainder: client should support cookiejar to persist the information between requests

@jonathanhefner
Copy link
Member

I agree with what @Joffref said:

I wouldn’t list that as required; I’d rather present it as the easiest path. Session management is ultimately at the discretion of whoever implements the MCP server. Using Redis to store sessions and share it across instances can also solve the problem, but it introduces a more complex architecture.

So I will close, but thank you for the pull request! 😃

jizhuozhi added a commit to jizhuozhi/modelcontextprotocol that referenced this pull request Dec 20, 2025
…g in MCP Deployments

- **Status**: Draft
- **Type**: Informational
- **Created**: 2025-12-21
- **Author(s)**: Zhuozhi Ji <jizhuozhi.george@gmail.com> (@jizhuozhi)
- **Sponsor**: None
- **PR**: https://github.com/modelcontextprotocol/specification/pull/0000

## Abstract

This SEP proposes optional high availability (HA) best practices for MCP deployments with stateful streaming sessions (e.g., SSE). While the MCP protocol itself remains unchanged, production deployments often face challenges in maintaining session continuity and resilience when using multiple replicas behind load balancers. This proposal outlines optional patterns, including pub-sub event buses, cluster coordination with P2P forwarding, middleware/SDK abstraction, and session partitioning. These patterns provide guidance for implementers to achieve HA without breaking protocol compatibility or requiring client modifications.

## Motivation

Production MCP deployments increasingly target multi-node, horizontally scalable environments. Long-lived streaming sessions (SSE) introduce challenges when routed through stateless HTTP ingress or load balancers:

- Session continuity may break if connections are routed to a different replica.
- Node failure or restart can interrupt ongoing streaming sessions.
- Resuming sessions across replicas is non-trivial without coordination.

Community discussions, including [GitHub PR modelcontextprotocol#325](modelcontextprotocol#325), have highlighted these issues. Contributors concluded that session stickiness or shared session stores are practical implementation considerations, but not mandated by the protocol. This creates an opportunity for **informational guidance** on HA patterns that are optional and non-intrusive.

## Specification

This SEP does not introduce protocol-level changes. The following optional HA patterns are proposed for implementers:

### 1. Core HA Patterns

#### 1.1 Event Bus / Pub-Sub
- Externalize session events to a distributed pub-sub system.
- MCP replicas subscribe to session events to enable failover and session recovery.
- Decouples session lifetime from any single node.

#### 1.2 Cluster Coordination & P2P Forwarding
- MCP nodes maintain lightweight cluster state via gossip, shared stores, or JDBC ping.
- Session messages can be forwarded to the node currently handling the session.
- Avoids heavy consensus mechanisms to preserve throughput.

### 2. Implementation & Optimization Support

#### 2.1 Middleware / SDK Abstraction
- Encapsulates HA logic (pub-sub, P2P forwarding) within SDK or middleware.
- Keeps protocol handlers and business logic unchanged.
- Provides a transparent API to clients, allowing gradual adoption.

#### 2.2 Session Partitioning / Affinity Hints
- Session IDs may encode partitioning or affinity hints.
- Reduces coordination overhead.
- Affinity is advisory and must not impact correctness.

### 3. Illustrative Middleware-Oriented Model (Python, Non-Normative)
```python
async def handle_mcp_message(message, send):
    if message["type"] == "tool_call":
        result = await run_tool(message["payload"])
        await send({
            "type": "tool_result",
            "payload": result
        })

class MCPHAMiddleware:
    def __init__(self, ha_backend):
        self.ha = ha_backend

    def wrap(self, handler):
        async def wrapped(message, send):
            session_id = self.ha.ensure_session(message)

            async with self.ha.bind_session(session_id, send) as ha_send:
                await handler(message, ha_send)

        return wrapped
````

## Rationale

* **Alternate designs considered**: Sticky sessions at load balancer, full Raft replication, central shared state.
* **Why chosen approach**: Optional patterns allow HA without protocol changes, preserve throughput, and provide flexibility.
* **Related work**: Community PR modelcontextprotocol#325; common HA patterns in distributed systems.
* **Community consensus**: PR discussion supports optional, non-normative guidance for HA.

## Backward Compatibility

No protocol changes are introduced. Existing clients and servers remain fully compatible. Adoption of HA patterns is optional and implementation-defined.

## Security Implications

No new security surfaces are introduced by this SEP. Implementers should consider standard security practices for distributed coordination, pub-sub, and session forwarding.

## Reference Implementation

* Prototype Python middleware shown above.
* No full reference implementation is required to mark SEP as draft.

## Additional Optional Sections

### Performance Implications

* Optional HA patterns may introduce additional latency or coordination overhead, but throughput is preserved by avoiding heavy consensus.

### Testing Plan

* Implementers should validate session continuity during failover, replica restart, and load balancer routing.

### Alternatives Considered

* Sticky sessions at LB (less flexible, not always feasible)
* Full Raft replication (high latency, throughput penalty)
* Central shared store (adds infrastructure complexity)

### Open Questions

* Best practices for large clusters with thousands of concurrent streaming sessions.
* Integration guidance for Streamable HTTP once adoption increases.

### Acknowledgments

* Community contributors to PR modelcontextprotocol#325 for highlighting HA challenges in production MCP deployments.
jizhuozhi added a commit to jizhuozhi/modelcontextprotocol that referenced this pull request Dec 20, 2025
…g in MCP Deployments

- **Status**: Draft
- **Type**: Informational
- **Created**: 2025-12-21
- **Author(s)**: Zhuozhi Ji <jizhuozhi.george@gmail.com> (@jizhuozhi)
- **Sponsor**: None
- **PR**: https://github.com/modelcontextprotocol/specification/pull/0000

## Abstract

This SEP proposes optional high availability (HA) best practices for MCP deployments with stateful streaming sessions (e.g., SSE). While the MCP protocol itself remains unchanged, production deployments often face challenges in maintaining session continuity and resilience when using multiple replicas behind load balancers. This proposal outlines optional patterns, including pub-sub event buses, cluster coordination with P2P forwarding, middleware/SDK abstraction, and session partitioning. These patterns provide guidance for implementers to achieve HA without breaking protocol compatibility or requiring client modifications.

## Motivation

Production MCP deployments increasingly target multi-node, horizontally scalable environments. Long-lived streaming sessions (SSE) introduce challenges when routed through stateless HTTP ingress or load balancers:

- Session continuity may break if connections are routed to a different replica.
- Node failure or restart can interrupt ongoing streaming sessions.
- Resuming sessions across replicas is non-trivial without coordination.

Community discussions, including [GitHub PR modelcontextprotocol#325](modelcontextprotocol#325), have highlighted these issues. Contributors concluded that session stickiness or shared session stores are practical implementation considerations, but not mandated by the protocol. This creates an opportunity for **informational guidance** on HA patterns that are optional and non-intrusive.

## Specification

This SEP does not introduce protocol-level changes. The following optional HA patterns are proposed for implementers:

### 1. Core HA Patterns

#### 1.1 Event Bus / Pub-Sub
- Externalize session events to a distributed pub-sub system.
- MCP replicas subscribe to session events to enable failover and session recovery.
- Decouples session lifetime from any single node.

#### 1.2 Cluster Coordination & P2P Forwarding
- MCP nodes maintain lightweight cluster state via gossip, shared stores, or JDBC ping.
- Session messages can be forwarded to the node currently handling the session.
- Avoids heavy consensus mechanisms to preserve throughput.

### 2. Implementation & Optimization Support

#### 2.1 Middleware / SDK Abstraction
- Encapsulates HA logic (pub-sub, P2P forwarding) within SDK or middleware.
- Keeps protocol handlers and business logic unchanged.
- Provides a transparent API to clients, allowing gradual adoption.

#### 2.2 Session Partitioning / Affinity Hints
- Session IDs may encode partitioning or affinity hints.
- Reduces coordination overhead.
- Affinity is advisory and must not impact correctness.

### 3. Illustrative Middleware-Oriented Model (Python, Non-Normative)
```python
async def handle_mcp_message(message, send):
    if message["type"] == "tool_call":
        result = await run_tool(message["payload"])
        await send({
            "type": "tool_result",
            "payload": result
        })

class MCPHAMiddleware:
    def __init__(self, ha_backend):
        self.ha = ha_backend

    def wrap(self, handler):
        async def wrapped(message, send):
            session_id = self.ha.ensure_session(message)

            async with self.ha.bind_session(session_id, send) as ha_send:
                await handler(message, ha_send)

        return wrapped
````

## Rationale

* **Alternate designs considered**: Sticky sessions at load balancer, full Raft replication, central shared state.
* **Why chosen approach**: Optional patterns allow HA without protocol changes, preserve throughput, and provide flexibility.
* **Related work**: Community PR modelcontextprotocol#325; common HA patterns in distributed systems.
* **Community consensus**: PR discussion supports optional, non-normative guidance for HA.

## Backward Compatibility

No protocol changes are introduced. Existing clients and servers remain fully compatible. Adoption of HA patterns is optional and implementation-defined.

## Security Implications

No new security surfaces are introduced by this SEP. Implementers should consider standard security practices for distributed coordination, pub-sub, and session forwarding.

## Reference Implementation

* Prototype Python middleware shown above.
* No full reference implementation is required to mark SEP as draft.

## Additional Optional Sections

### Performance Implications

* Optional HA patterns may introduce additional latency or coordination overhead, but throughput is preserved by avoiding heavy consensus.

### Testing Plan

* Implementers should validate session continuity during failover, replica restart, and load balancer routing.

### Alternatives Considered

* Sticky sessions at LB (less flexible, not always feasible)
* Full Raft replication (high latency, throughput penalty)
* Central shared store (adds infrastructure complexity)

### Open Questions

* Best practices for large clusters with thousands of concurrent streaming sessions.
* Integration guidance for Streamable HTTP once adoption increases.

### Acknowledgments

* Community contributors to PR modelcontextprotocol#325 for highlighting HA challenges in production MCP deployments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants