-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
TL;DR Summary
Proposal: Introduce optional High Availability best practices for MCP deployments to address session continuity and failover in stateful streaming (SSE) connections.
- Focuses on HA challenges in long-lived streaming sessions behind load balancers.
- Recommends non-normative, optional patterns (pub-sub, cluster coordination, middleware, session partitioning).
Background / Community Context
MCP deployments increasingly target production environments with multiple replicas and distributed workloads. Real-world deployments face challenges when stateful streaming sessions coexist with stateless HTTP ingress.
Additionally, the community has previously discussed handling MCP sessions behind load balancers (see GitHub PR #325). Contributors highlighted that MCP’s stateful session establishment (e.g., via SSE) can conflict with stateless load balancing unless session affinity is maintained. The conversation concluded that while session stickiness is a practical consideration, specifying such behavior at the protocol level was not appropriate; instead, guidance for gateways and high‑availability deployments would be valuable. (github.com)
Key points from the discussion:
- Session continuity: Long-lived connections (SSE) may be routed to different replicas, breaking sessions without sticky sessions.
- Implementation discretion: Session management, sticky sessions, and shared session stores are left to implementers.
- Best practices guidance: The community sees value in providing optional HA patterns or gateway guidance rather than normative protocol changes.
This context provides direct support for proposing optional HA best practices, aligning with ongoing community awareness of multi-node deployment challenges.
Goals
- Enable MCP servers to scale horizontally while preserving session continuity.
- Avoid reliance on sticky sessions at the load balancer level.
- Keep MCP protocol semantics unchanged.
- Provide implementation-agnostic guidance applicable to diverse transports and environments.
- Allow gradual adoption without disrupting existing deployments.
Non-Goals
- Mandating specific HA mechanisms or middleware.
- Introducing protocol-level changes to MCP messages.
- Replacing existing transports or enforcing Streamable HTTP.
- Evaluating or criticizing specific platforms or client ecosystems.
Motivation: HA Necessity in Production Deployments
Production MCP deployments face several HA-related challenges:
- Stateless ingress vs. stateful session/streaming egress: load balancers can route connections to different replicas, causing session disruption.
- Single-node failure or restart can interrupt ongoing streaming sessions.
- Session resumption across replicas is non-trivial without additional coordination.
These issues exist independently of the underlying transport or client support and justify optional HA patterns for reliable operation in multi-node deployments.
Optional HA Patterns
1. Core HA Patterns (Optional)
1.1 Event Bus / Pub-Sub
- Externalize session events to a distributed pub-sub system.
- Multiple MCP server replicas can subscribe and replay events transparently.
- Decouples session lifetime from any single server node.
- Enables failover and session recovery without client awareness.
1.2 Cluster Coordination & P2P Forwarding
- MCP nodes maintain lightweight cluster state via gossip, shared stores, or JDBC ping.
- Session messages can be forwarded peer-to-peer to the node currently handling the session.
- Provides best-effort replication and routing for streaming events.
- Avoids heavy consensus mechanisms (e.g., Raft) to preserve throughput.
2. Implementation & Optimization Support (Optional)
2.1 Middleware / SDK Abstraction
- Encapsulates HA logic (pub-sub, P2P forwarding).
- Keeps protocol handlers and business logic unchanged.
- Provides a transparent API for SDK consumers.
- Optional per deployment, allows gradual adoption.
2.2 Session Partitioning / Affinity Hints
- Opaque session IDs may encode internal partitioning or affinity hints.
- Reduces coordination overhead in large clusters.
- Affinity is advisory; correctness must not depend on it.
- Complements middleware, pub-sub, or P2P forwarding.
Illustrative Middleware-Oriented Model (Python, Non-Normative)
async def handle_mcp_message(message, send):
if message["type"] == "tool_call":
result = await run_tool(message["payload"])
await send({
"type": "tool_result",
"payload": result
})
class MCPHAMiddleware:
def __init__(self, ha_backend):
self.ha = ha_backend
def wrap(self, handler):
async def wrapped(message, send):
session_id = self.ha.ensure_session(message)
async with self.ha.bind_session(session_id, send) as ha_send:
await handler(message, ha_send)
return wrapped- Business logic remains unchanged.
- HA logic (pub-sub, gossip, session partition) is fully encapsulated.
- Works in both single-node and multi-node deployments.
Compatibility
- Fully backward compatible with existing MCP servers.
- HA support is optional and implementation-defined.
- Does not require clients to be aware of HA mechanisms.
- Can coexist with Streamable HTTP.
Summary
Production MCP deployments need HA considerations to maintain reliability and scalability. This proposal:
- Highlights HA challenges.
- Recommends optional, non-invasive best practices.
- Maintains protocol compatibility and low intrusion.
- Supports gradual adoption as ecosystems evolve.