Proposal: Optional High Availability Best Practices for MCP Deployments with Stateful Streaming (SSE) Connections

# TL;DR Summary

**Proposal:** Introduce optional High Availability best practices for MCP deployments to address **session continuity and failover in stateful streaming (SSE) connections**.  
- Focuses on **HA challenges in long-lived streaming sessions** behind load balancers.  
- Recommends **non-normative, optional patterns** (pub-sub, cluster coordination, middleware, session partitioning).  

---

## Background / Community Context

MCP deployments increasingly target **production environments** with multiple replicas and distributed workloads. Real-world deployments face challenges when **stateful streaming sessions** coexist with **stateless HTTP ingress**.  

Additionally, the community has previously discussed handling MCP sessions behind load balancers (see GitHub PR #325). Contributors highlighted that MCP’s stateful session establishment (e.g., via SSE) can conflict with stateless load balancing unless session affinity is maintained. The conversation concluded that while session stickiness is a practical consideration, specifying such behavior at the protocol level was not appropriate; instead, guidance for gateways and high‑availability deployments would be valuable. ([github.com](https://github.com/modelcontextprotocol/modelcontextprotocol/pull/325))

Key points from the discussion:  

- **Session continuity**: Long-lived connections (SSE) may be routed to different replicas, breaking sessions without sticky sessions.  
- **Implementation discretion**: Session management, sticky sessions, and shared session stores are left to implementers.  
- **Best practices guidance**: The community sees value in providing **optional HA patterns or gateway guidance** rather than normative protocol changes.

This context provides **direct support for proposing optional HA best practices**, aligning with ongoing community awareness of multi-node deployment challenges.

---

## Goals

- Enable MCP servers to **scale horizontally** while preserving session continuity.  
- Avoid reliance on sticky sessions at the load balancer level.  
- Keep MCP **protocol semantics unchanged**.  
- Provide **implementation-agnostic guidance** applicable to diverse transports and environments.  
- Allow gradual adoption without disrupting existing deployments.

---

## Non-Goals

- Mandating specific HA mechanisms or middleware.  
- Introducing protocol-level changes to MCP messages.  
- Replacing existing transports or enforcing Streamable HTTP.  
- Evaluating or criticizing specific platforms or client ecosystems.

---

## Motivation: HA Necessity in Production Deployments

Production MCP deployments face several HA-related challenges:

- **Stateless ingress vs. stateful session/streaming egress**: load balancers can route connections to different replicas, causing session disruption.  
- **Single-node failure or restart** can interrupt ongoing streaming sessions.  
- **Session resumption across replicas** is non-trivial without additional coordination.  

These issues exist independently of the underlying transport or client support and justify optional HA patterns for reliable operation in multi-node deployments.

---

## Optional HA Patterns

### 1. Core HA Patterns (Optional)

#### 1.1 Event Bus / Pub-Sub

- Externalize session events to a **distributed pub-sub system**.  
- Multiple MCP server replicas can **subscribe and replay events** transparently.  
- Decouples session lifetime from any single server node.  
- Enables failover and session recovery without client awareness.

#### 1.2 Cluster Coordination & P2P Forwarding

- MCP nodes maintain **lightweight cluster state** via gossip, shared stores, or JDBC ping.  
- Session messages can be **forwarded peer-to-peer** to the node currently handling the session.  
- Provides **best-effort replication and routing** for streaming events.  
- Avoids heavy consensus mechanisms (e.g., Raft) to preserve throughput.

---

### 2. Implementation & Optimization Support (Optional)

#### 2.1 Middleware / SDK Abstraction

- Encapsulates HA logic (pub-sub, P2P forwarding).  
- Keeps protocol handlers and business logic unchanged.  
- Provides a transparent API for SDK consumers.  
- Optional per deployment, allows gradual adoption.

#### 2.2 Session Partitioning / Affinity Hints

- Opaque session IDs may encode **internal partitioning or affinity hints**.  
- Reduces coordination overhead in large clusters.  
- Affinity is advisory; correctness must not depend on it.  
- Complements middleware, pub-sub, or P2P forwarding.

---

## Illustrative Middleware-Oriented Model (Python, Non-Normative)

```python
async def handle_mcp_message(message, send):
    if message["type"] == "tool_call":
        result = await run_tool(message["payload"])
        await send({
            "type": "tool_result",
            "payload": result
        })

class MCPHAMiddleware:
    def __init__(self, ha_backend):
        self.ha = ha_backend

    def wrap(self, handler):
        async def wrapped(message, send):
            session_id = self.ha.ensure_session(message)

            async with self.ha.bind_session(session_id, send) as ha_send:
                await handler(message, ha_send)

        return wrapped
````

* Business logic remains unchanged.
* HA logic (pub-sub, gossip, session partition) is fully encapsulated.
* Works in both single-node and multi-node deployments.

---

## Compatibility

* Fully backward compatible with existing MCP servers.
* HA support is optional and implementation-defined.
* Does not require clients to be aware of HA mechanisms.
* Can coexist with Streamable HTTP.

---

## Summary

Production MCP deployments need HA considerations to maintain reliability and scalability. This proposal:

* Highlights HA challenges.
* Recommends **optional, non-invasive best practices**.
* Maintains protocol compatibility and low intrusion.
* Supports gradual adoption as ecosystems evolve.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Optional High Availability Best Practices for MCP Deployments with Stateful Streaming (SSE) Connections #2000

TL;DR Summary

Background / Community Context

Goals

Non-Goals

Motivation: HA Necessity in Production Deployments

Optional HA Patterns

1. Core HA Patterns (Optional)

1.1 Event Bus / Pub-Sub

1.2 Cluster Coordination & P2P Forwarding

2. Implementation & Optimization Support (Optional)

2.1 Middleware / SDK Abstraction

2.2 Session Partitioning / Affinity Hints

Illustrative Middleware-Oriented Model (Python, Non-Normative)

Compatibility

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Optional High Availability Best Practices for MCP Deployments with Stateful Streaming (SSE) Connections #2000

Description

TL;DR Summary

Background / Community Context

Goals

Non-Goals

Motivation: HA Necessity in Production Deployments

Optional HA Patterns

1. Core HA Patterns (Optional)

1.1 Event Bus / Pub-Sub

1.2 Cluster Coordination & P2P Forwarding

2. Implementation & Optimization Support (Optional)

2.1 Middleware / SDK Abstraction

2.2 Session Partitioning / Affinity Hints

Illustrative Middleware-Oriented Model (Python, Non-Normative)

Compatibility

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions