Skip to content

Have server ack app_heartbeat messages received from client#13810

Open
vdonato wants to merge 1 commit intodevelopfrom
vdonato/ack-heartbeats-from-server
Open

Have server ack app_heartbeat messages received from client#13810
vdonato wants to merge 1 commit intodevelopfrom
vdonato/ack-heartbeats-from-server

Conversation

@vdonato
Copy link
Collaborator

@vdonato vdonato commented Feb 4, 2026

Describe your changes

In hosted Streamlit app products like SiS and Community Cloud, we're currently very slow to detect when
a user's internet connection is running into problems. This is because we don't have a mechanism to quickly
detect when our websocket connection runs into trouble, and it's possible for minutes to pass after our
websocket connection actually drops before we finally see the network error dialog.

To make this process faster, we extend the existing SEND_APP_HEARTBEAT mechanism, which exists
to allow the host of a Streamlit app request that the OSS client send a heartbeat to the server for the sole
purpose of keeping its websocket connection alive in environments where it'll otherwise time out. With these
new changes, we also have the server send an ack that it received this message back to the client. If no
ack is received, we assume that we may have a connection error and attempt to reconnect our websocket
connection, which will cause us to either

  • hit the connection error dialog if the network problems are real, or
  • successfully reconnect with little consequence otherwise.

In the normal OSS world, we don't receive SEND_APP_HEARTBEAT messages, so none of this new code
is ever invoked.

Testing Plan

  • Updated python/typescript unit tests
  • Manual testing performed by:
    • Updated hostframe.html to add a new button to send an app heartbeat
    • Manually removed the code to ack the heartbeat from app_session.py to simulate a network issue
    • Verified that the client disconnects its websocket connection after the timeout.

@vdonato vdonato added security-assessment-completed Security assessment has been completed for PR change:feature PR contains new feature or enhancement implementation impact:users PR changes affect end users labels Feb 4, 2026
@snyk-io
Copy link
Contributor

snyk-io bot commented Feb 4, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

✅ PR preview is ready!

Name Link
📦 Wheel file https://core-previews.s3-us-west-2.amazonaws.com/pr-13810/streamlit-1.53.1-py3-none-any.whl
📦 @streamlit/component-v2-lib Download from artifacts
🕹️ Preview app pr-13810.streamlit.app (☁️ Deploy here if not accessible)

@vdonato vdonato marked this pull request as ready for review February 4, 2026 00:33
Copilot AI review requested due to automatic review settings February 4, 2026 00:33
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

📈 Frontend coverage change detected

The frontend unit test (vitest) coverage has increased by 0.1500%

  • Current PR: 86.5900% (13740 lines, 1842 missed)
  • Latest develop: 86.4400% (13715 lines, 1859 missed)

🎉 Great job on improving test coverage!

📊 View detailed coverage comparison

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a server-side acknowledgment mechanism for app heartbeat messages to enable faster detection of connection issues in hosted Streamlit environments (SiS and Community Cloud). When the client sends an app heartbeat, the server now responds with a heartbeat_ack message. If the client doesn't receive this acknowledgment within a timeout (29 seconds), it attempts to reconnect, which either surfaces the connection error dialog immediately or successfully reconnects with minimal disruption.

Changes:

  • Added heartbeat_ack field to the ForwardMsg protobuf definition for server-to-client acknowledgment
  • Modified backend _handle_app_heartbeat_request to send acknowledgment ForwardMsg when heartbeat is received
  • Implemented heartbeat timeout tracking in ConnectionManager with automatic reconnection on timeout
  • Added reconnect() method to WebsocketConnection to trigger reconnection without permanent disconnection

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
proto/streamlit/proto/ForwardMsg.proto Added heartbeat_ack boolean field (26) to ForwardMsg for server acknowledgments
lib/streamlit/runtime/app_session.py Modified _handle_app_heartbeat_request to send heartbeat_ack ForwardMsg response
lib/tests/streamlit/runtime/app_session_test.py Added test verifying that heartbeat handler sends proper ack message
frontend/connection/src/constants.ts Added HEARTBEAT_ACK_TIMEOUT_MS constant (29 seconds) for ack timeout
frontend/connection/src/WebsocketConnection.tsx Added reconnect() method to close connection and transition to PINGING_SERVER state
frontend/connection/src/WebsocketConnection.test.tsx Added tests for reconnect behavior in different connection states
frontend/connection/src/ConnectionManager.ts Implemented heartbeat ack timeout tracking and reconnection logic
frontend/connection/src/ConnectionManager.test.ts Added comprehensive tests for heartbeat timeout, ack handling, and cleanup
frontend/app/src/App.tsx Integrated heartbeat ack handling by calling ConnectionManager methods
frontend/app/src/App.test.tsx Added tests verifying onHeartbeatSent and onHeartbeatAckReceived are called correctly
e2e_playwright/test_assets/hostframe.html Added "Send Heartbeat" button for manual testing

Comment on lines +4605 to +4615
it("calls onHeartbeatAckReceived when heartbeatAck ForwardMsg is received", () => {
prepareHostCommunicationManager()

const connectionManager = getMockConnectionManager(true)

act(() => {
sendForwardMessage("heartbeatAck", true)
})

expect(connectionManager.onHeartbeatAckReceived).toHaveBeenCalledTimes(1)
})
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should include a negative assertion to verify that onHeartbeatSent is NOT called when a heartbeatAck message is received. This would ensure that receiving an ack doesn't trigger the sending of a heartbeat, providing anti-regression coverage for the separation of concerns between these two operations.

Copilot uses AI. Check for mistakes.
Comment on lines +4593 to +4603
it("calls onHeartbeatSent when SEND_APP_HEARTBEAT is received", () => {
prepareHostCommunicationManager()

const connectionManager = getMockConnectionManager(true)

fireWindowPostMessage({
type: "SEND_APP_HEARTBEAT",
})

expect(connectionManager.onHeartbeatSent).toHaveBeenCalledTimes(1)
})
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should include a negative assertion to verify that onHeartbeatAckReceived is NOT called when a SEND_APP_HEARTBEAT message is received. This would ensure that the two different mechanisms (sending heartbeat vs. receiving ack) are properly separated and provide anti-regression coverage.

Copilot uses AI. Check for mistakes.
@sfc-gh-lmasuch sfc-gh-lmasuch added the ai-review If applied to PR or issue will run AI review workflow label Feb 4, 2026
@github-actions github-actions bot removed the ai-review If applied to PR or issue will run AI review workflow label Feb 4, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

Summary

This PR extends the existing SEND_APP_HEARTBEAT mechanism to include server acknowledgments. When the frontend sends an app_heartbeat BackMsg, the server now responds with a heartbeat_ack ForwardMsg. The frontend tracks these acks and triggers a websocket reconnect if an ack isn't received within 29 seconds (1 second shorter than the expected 30s heartbeat interval in hosted environments).

Main changes:

  • Added heartbeat_ack field (field 26) to ForwardMsg.proto
  • Backend: _handle_app_heartbeat_request() now enqueues a heartbeat_ack ForwardMsg
  • Frontend: ConnectionManager tracks heartbeat timeouts and triggers reconnects on timeout
  • Frontend: WebsocketConnection added reconnect() method that transitions FSM to PINGING_SERVER
  • Frontend: App.tsx handles heartbeatAck messages and notifies ConnectionManager

Code Quality

The code is well-structured and follows existing patterns in the codebase.

Strengths:

  • Clear separation of concerns: ConnectionManager handles timeout tracking, WebsocketConnection handles FSM transitions
  • Proper cleanup in disconnect() to prevent memory leaks (line 227 in ConnectionManager.ts)
  • Good defensive programming with isConnected() check before reconnect (line 250 in ConnectionManager.ts)
  • Well-documented constants with rationale (HEARTBEAT_ACK_TIMEOUT_MS in constants.ts:76-83)
  • Clean docstrings following numpydoc style in the Python code

Minor observations:

  • The awaitingHeartbeatAck flag (line 104 in ConnectionManager.ts) is technically redundant since heartbeatAckTimeoutId !== undefined serves the same purpose, but it adds clarity and is an acceptable defensive pattern.

Test Coverage

Test coverage is comprehensive and follows best practices.

Python tests (app_session_test.py:1279-1291):

  • Tests that heartbeat_ack ForwardMsg is enqueued with correct type
  • Includes negative assertion (WhichOneof("type") == "heartbeat_ack") to verify correct message type

Frontend tests:

  1. ConnectionManager.test.ts (new file, 194 lines):

    • Tests timeout starting when heartbeat is sent
    • Tests reconnect triggered on timeout
    • Tests timeout clearing when new heartbeat sent (proper debounce)
    • Tests timeout clearing on ack received
    • Tests no error when ack received without pending heartbeat
    • Tests timeout cleared on disconnect
    • Tests no reconnect attempt when already disconnected
    • Good use of vi.useFakeTimers() for deterministic timing tests
  2. WebsocketConnection.test.tsx (lines 1018-1032):

    • Tests reconnect() transitions to PINGING_SERVER when connected
    • Tests reconnect() does nothing when not connected (negative assertion)
  3. App.test.tsx (lines 4590-4606):

    • Tests onHeartbeatSent called on SEND_APP_HEARTBEAT message
    • Tests onHeartbeatAckReceived called on heartbeatAck ForwardMsg

E2E test infrastructure:

  • Updated hostframe.html with "Send Heartbeat" button for manual testing

Backwards Compatibility

Fully backwards compatible:

  • Proto field 26 is purely additive; old clients ignore unknown fields
  • In OSS Streamlit, SEND_APP_HEARTBEAT is never sent by hosts, so this code path is never executed
  • Old servers won't send heartbeat_ack, but this only matters in hosted environments that would update both client and server together
  • The reconnect() method uses existing FSM transitions (PINGING_SERVER), not new states

Security & Risk

No security concerns identified:

  • No new attack vectors introduced
  • Heartbeat mechanism doesn't expose sensitive information
  • No new authentication or authorization paths

Risk assessment:

  • Low risk: The feature is opt-in (only triggered by hosted environments sending SEND_APP_HEARTBEAT)
  • Graceful degradation: If server doesn't respond with ack, worst case is a reconnect attempt
  • Timeout value: 29s timeout assumes 30s heartbeat interval. If configured differently, could cause unnecessary reconnects, but this is documented and acceptable for hosted use cases

Recommendations

No blocking issues found. The implementation is clean, well-tested, and follows established patterns.

Optional consideration (non-blocking):

  1. The timeout constant HEARTBEAT_ACK_TIMEOUT_MS = 29 * 1000 assumes a 30s heartbeat interval. If this interval ever becomes configurable per-deployment, consider making the timeout configurable as well. This is documented in the code comments and acceptable for current use cases.

Verdict

APPROVED: This is a well-implemented feature with comprehensive test coverage, proper cleanup handling, and full backwards compatibility. The code follows existing patterns and best practices, and the PR description clearly explains the motivation and testing approach.


This is an automated AI review using opus-4.5-thinking. Please verify the feedback and use your judgment.

this.awaitingHeartbeatAck = true

this.heartbeatAckTimeoutId = setTimeout(() => {
if (this.awaitingHeartbeatAck) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: When we leave CONNECTED while a heartbeat is pending, this timeout can still fire later and log "connection may be unhealthy", which is misleading noise and keeps awaitingHeartbeatAck stale longer than necessary. I think we need to clear the awaitingHeartbeatAck state on certain transitions in setConnectionState.

).connectionState = ConnectionState.CONNECTED
})

afterEach(() => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Ensure that we actually clean up everything in this afterEach hook:

afterEach(() => {
  connectionManager.disconnect()
  vi.clearAllTimers()
  vi.clearAllMocks()
  vi.useRealTimers()
})

* Timeout for heartbeat acknowledgment in milliseconds.
* If we send a heartbeat and don't receive an ack within this time,
* we consider the connection unhealthy. This time is set to be 1 second
* shorter than the heartbeat interval of 30s that we expect to be commonly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Is there any possibility of race conditions with this value?

Comment on lines +99 to +104
/**
* Tracks whether we're expecting a heartbeat ack from the server.
* This is set to true when a heartbeat is sent and false when an ack is
* received or when the timeout fires.
*/
private awaitingHeartbeatAck = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Could we simplify the state by only tracking heartbeatAckTimeoutId? The existence of heartbeatAckTimeoutId should be enough to determine if we're awaiting or not, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:feature PR contains new feature or enhancement implementation impact:users PR changes affect end users security-assessment-completed Security assessment has been completed for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants