Stream Resumption & Recovery

Relevant source files

Purpose and Scope

This document describes the stream resumption and recovery system within Sim Studio's Copilot and Mothership interfaces. The system ensures that AI agent interactions—specifically long-running streaming responses—can survive page refreshes, network interruptions, or accidental tab closures. It achieves this through a combination of client-side sessionStorage persistence, server-side event buffering, and sequence-based replay using lastEventId tracking.

For the broader Copilot system, see Copilot Architecture. For details on how message state is preserved, see Message Management & Checkpoints.

Architecture Overview

The stream resumption system operates across the client UI, the Next.js API layer, and the server-side orchestrator.

System Components and Data Flow

Diagram: Stream Resumption Architecture

The server-side orchestrator buffers events as they are generated. If a client disconnects, it can request a reconnection via the stream endpoint, providing the ID of the last successfully processed event to trigger a replay.

Sources:

Client-Side Persistence

To survive a page refresh, the client must persist the current streamId and the sequence of the last received event.

Session Storage Schema

The system uses sessionStorage (rather than localStorage) to ensure that stream state is specific to a browser tab and is cleared when the tab is closed.

Key	Type	Description
`streamId`	`string`	Unique identifier for the active SSE stream.
`lastEventId`	`number`	The sequence ID of the most recent event processed by the client.
`assistantId`	`string`	The ID of the agent or sub-agent providing the response.
`expectedGen`	`number`	A generation counter to prevent stale stream attachments.

Sources:

Recovery Triggering

When the useChat hook initializes, it checks for a pending stream in storage. If found, it attempts to "attach" to the existing stream instead of starting a new conversation.

Sources:

Server-Side Event Buffering

The server uses a buffering mechanism to store SSE events, allowing them to be replayed if a client reconnects.

Sequence and Event Tracking

The orchestrateCopilotStream function utilizes a StreamingContext to track the lifecycle of a stream. As events are generated by the AI providers or tool executors, they are passed through an event writer that buffers them.

Event Generation: The runStreamLoop generates SSE events (e.g., content, tool_call).
Buffering: The createStreamEventWriter captures these events into the buffer.
Persistence: Events are indexed by streamId and an incrementing eventId.

Sources:

Replay Mechanism

When a client reconnects, the server:

Retrieves metadata for the stream via getStreamMeta.
Reads all events from the buffer using readStreamEvents.
Streams these events back to the client as a batch to quickly catch up the UI state via buildReplayStream.

Sources:

Reconnection Logic & Tool Recovery

Reconnection is not just about replaying text; it also involves recovering the state of tool executions, especially those that run on the client side.

Exponential Backoff

The client implements an exponential backoff strategy when a stream disconnects unexpectedly.

Parameter	Value
`MAX_RECONNECT_ATTEMPTS`	10
`RECONNECT_BASE_DELAY_MS`	1000ms
`RECONNECT_MAX_DELAY_MS`	30,000ms

Sources:

apps/sim/app/workspace/[workspaceId]/home/hooks/use-chat.ts:99-101

Tool Call Deduplication

A critical challenge in stream recovery is avoiding the re-execution of tools that were already triggered. The extractToolCallIdsFromSnapshot function scans replayed events to identify tools that have already been dispatched.

This prevents the UI from re-running client-side tools (like open_resource or run_workflow) when catching up on a resumed stream.

Sources:

Distributed Coordination

In multi-node deployments, stream resumption requires distributed locking to ensure two different server instances don't attempt to orchestrate the same chat simultaneously.

Redis Stream Locks

The system uses Redis-based locks to manage "ownership" of a chat stream.

Acquisition: acquirePendingChatStream attempts to set a Redis key copilot:chat-stream-lock:[chatId] with the current streamId.
Conflict Handling: If another stream is active, the request waits for it to settle or for the lock to expire (TTL of 2 hours).
Abortion: If a user sends a new message while a stream is active, abortActiveStream publishes an abort signal to Redis, which the original orchestrating node polls for and respects.

Sources:

Reconnection Sequence (Code Perspective)

Diagram: Code-Level Resumption Flow

Sources:

Stream Resumption & Recovery

Relevant source files

Purpose and Scope

For the broader Copilot system, see Copilot Architecture. For details on how message state is preserved, see Message Management & Checkpoints.

Architecture Overview

The stream resumption system operates across the client UI, the Next.js API layer, and the server-side orchestrator.

System Components and Data Flow

Diagram: Stream Resumption Architecture

Sources:

Client-Side Persistence

To survive a page refresh, the client must persist the current streamId and the sequence of the last received event.

Session Storage Schema

The system uses sessionStorage (rather than localStorage) to ensure that stream state is specific to a browser tab and is cleared when the tab is closed.

Key	Type	Description
`streamId`	`string`	Unique identifier for the active SSE stream.
`lastEventId`	`number`	The sequence ID of the most recent event processed by the client.
`assistantId`	`string`	The ID of the agent or sub-agent providing the response.
`expectedGen`	`number`	A generation counter to prevent stale stream attachments.

Sources:

Recovery Triggering

When the useChat hook initializes, it checks for a pending stream in storage. If found, it attempts to "attach" to the existing stream instead of starting a new conversation.

Sources:

Server-Side Event Buffering

The server uses a buffering mechanism to store SSE events, allowing them to be replayed if a client reconnects.

Sequence and Event Tracking

Event Generation: The runStreamLoop generates SSE events (e.g., content, tool_call).
Buffering: The createStreamEventWriter captures these events into the buffer.
Persistence: Events are indexed by streamId and an incrementing eventId.

Sources:

Replay Mechanism

When a client reconnects, the server:

Retrieves metadata for the stream via getStreamMeta.
Reads all events from the buffer using readStreamEvents.
Streams these events back to the client as a batch to quickly catch up the UI state via buildReplayStream.

Sources:

Reconnection Logic & Tool Recovery

Reconnection is not just about replaying text; it also involves recovering the state of tool executions, especially those that run on the client side.

Exponential Backoff

The client implements an exponential backoff strategy when a stream disconnects unexpectedly.

Parameter	Value
`MAX_RECONNECT_ATTEMPTS`	10
`RECONNECT_BASE_DELAY_MS`	1000ms
`RECONNECT_MAX_DELAY_MS`	30,000ms

Sources:

apps/sim/app/workspace/[workspaceId]/home/hooks/use-chat.ts:99-101

Tool Call Deduplication

This prevents the UI from re-running client-side tools (like open_resource or run_workflow) when catching up on a resumed stream.

Sources:

Distributed Coordination

In multi-node deployments, stream resumption requires distributed locking to ensure two different server instances don't attempt to orchestrate the same chat simultaneously.

Redis Stream Locks

The system uses Redis-based locks to manage "ownership" of a chat stream.

Acquisition: acquirePendingChatStream attempts to set a Redis key copilot:chat-stream-lock:[chatId] with the current streamId.
Conflict Handling: If another stream is active, the request waits for it to settle or for the lock to expire (TTL of 2 hours).
Abortion: If a user sends a new message while a stream is active, abortActiveStream publishes an abort signal to Redis, which the original orchestrating node polls for and respects.

Sources:

Reconnection Sequence (Code Perspective)

Diagram: Code-Level Resumption Flow

Sources:

Stream Resumption & Recovery

Purpose and Scope

Architecture Overview

System Components and Data Flow

Client-Side Persistence

Session Storage Schema

Recovery Triggering

Server-Side Event Buffering

Sequence and Event Tracking

Replay Mechanism

Reconnection Logic & Tool Recovery

Exponential Backoff

Tool Call Deduplication

Distributed Coordination

Redis Stream Locks

Reconnection Sequence (Code Perspective)

On this page

Stream Resumption & Recovery

Purpose and Scope

Architecture Overview

System Components and Data Flow

Client-Side Persistence

Session Storage Schema

Recovery Triggering

Server-Side Event Buffering

Sequence and Event Tracking

Replay Mechanism

Reconnection Logic & Tool Recovery

Exponential Backoff

Tool Call Deduplication

Distributed Coordination

Redis Stream Locks

Reconnection Sequence (Code Perspective)

On this page