This document describes the stream resumption and recovery system within Sim Studio's Copilot and Mothership interfaces. The system ensures that AI agent interactions—specifically long-running streaming responses—can survive page refreshes, network interruptions, or accidental tab closures. It achieves this through a combination of client-side sessionStorage persistence, server-side event buffering, and sequence-based replay using lastEventId tracking.
For the broader Copilot system, see Copilot Architecture. For details on how message state is preserved, see Message Management & Checkpoints.
The stream resumption system operates across the client UI, the Next.js API layer, and the server-side orchestrator.
Diagram: Stream Resumption Architecture
The server-side orchestrator buffers events as they are generated. If a client disconnects, it can request a reconnection via the stream endpoint, providing the ID of the last successfully processed event to trigger a replay.
Sources:
To survive a page refresh, the client must persist the current streamId and the sequence of the last received event.
The system uses sessionStorage (rather than localStorage) to ensure that stream state is specific to a browser tab and is cleared when the tab is closed.
| Key | Type | Description |
|---|---|---|
streamId | string | Unique identifier for the active SSE stream. |
lastEventId | number | The sequence ID of the most recent event processed by the client. |
assistantId | string | The ID of the agent or sub-agent providing the response. |
expectedGen | number | A generation counter to prevent stale stream attachments. |
Sources:
When the useChat hook initializes, it checks for a pending stream in storage. If found, it attempts to "attach" to the existing stream instead of starting a new conversation.
Sources:
The server uses a buffering mechanism to store SSE events, allowing them to be replayed if a client reconnects.
The orchestrateCopilotStream function utilizes a StreamingContext to track the lifecycle of a stream. As events are generated by the AI providers or tool executors, they are passed through an event writer that buffers them.
runStreamLoop generates SSE events (e.g., content, tool_call).createStreamEventWriter captures these events into the buffer.streamId and an incrementing eventId.Sources:
When a client reconnects, the server:
getStreamMeta.readStreamEvents.buildReplayStream.Sources:
Reconnection is not just about replaying text; it also involves recovering the state of tool executions, especially those that run on the client side.
The client implements an exponential backoff strategy when a stream disconnects unexpectedly.
| Parameter | Value |
|---|---|
MAX_RECONNECT_ATTEMPTS | 10 |
RECONNECT_BASE_DELAY_MS | 1000ms |
RECONNECT_MAX_DELAY_MS | 30,000ms |
Sources:
A critical challenge in stream recovery is avoiding the re-execution of tools that were already triggered. The extractToolCallIdsFromSnapshot function scans replayed events to identify tools that have already been dispatched.
This prevents the UI from re-running client-side tools (like open_resource or run_workflow) when catching up on a resumed stream.
Sources:
In multi-node deployments, stream resumption requires distributed locking to ensure two different server instances don't attempt to orchestrate the same chat simultaneously.
The system uses Redis-based locks to manage "ownership" of a chat stream.
acquirePendingChatStream attempts to set a Redis key copilot:chat-stream-lock:[chatId] with the current streamId.abortActiveStream publishes an abort signal to Redis, which the original orchestrating node polls for and respects.Sources:
Diagram: Code-Level Resumption Flow
Sources:
Refresh this wiki