Have server ack app_heartbeat messages received from client#13810
Have server ack app_heartbeat messages received from client#13810
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
✅ PR preview is ready!
|
📈 Frontend coverage change detectedThe frontend unit test (vitest) coverage has increased by 0.1500%
🎉 Great job on improving test coverage! |
There was a problem hiding this comment.
Pull request overview
This PR implements a server-side acknowledgment mechanism for app heartbeat messages to enable faster detection of connection issues in hosted Streamlit environments (SiS and Community Cloud). When the client sends an app heartbeat, the server now responds with a heartbeat_ack message. If the client doesn't receive this acknowledgment within a timeout (29 seconds), it attempts to reconnect, which either surfaces the connection error dialog immediately or successfully reconnects with minimal disruption.
Changes:
- Added
heartbeat_ackfield to theForwardMsgprotobuf definition for server-to-client acknowledgment - Modified backend
_handle_app_heartbeat_requestto send acknowledgment ForwardMsg when heartbeat is received - Implemented heartbeat timeout tracking in
ConnectionManagerwith automatic reconnection on timeout - Added
reconnect()method toWebsocketConnectionto trigger reconnection without permanent disconnection
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| proto/streamlit/proto/ForwardMsg.proto | Added heartbeat_ack boolean field (26) to ForwardMsg for server acknowledgments |
| lib/streamlit/runtime/app_session.py | Modified _handle_app_heartbeat_request to send heartbeat_ack ForwardMsg response |
| lib/tests/streamlit/runtime/app_session_test.py | Added test verifying that heartbeat handler sends proper ack message |
| frontend/connection/src/constants.ts | Added HEARTBEAT_ACK_TIMEOUT_MS constant (29 seconds) for ack timeout |
| frontend/connection/src/WebsocketConnection.tsx | Added reconnect() method to close connection and transition to PINGING_SERVER state |
| frontend/connection/src/WebsocketConnection.test.tsx | Added tests for reconnect behavior in different connection states |
| frontend/connection/src/ConnectionManager.ts | Implemented heartbeat ack timeout tracking and reconnection logic |
| frontend/connection/src/ConnectionManager.test.ts | Added comprehensive tests for heartbeat timeout, ack handling, and cleanup |
| frontend/app/src/App.tsx | Integrated heartbeat ack handling by calling ConnectionManager methods |
| frontend/app/src/App.test.tsx | Added tests verifying onHeartbeatSent and onHeartbeatAckReceived are called correctly |
| e2e_playwright/test_assets/hostframe.html | Added "Send Heartbeat" button for manual testing |
| it("calls onHeartbeatAckReceived when heartbeatAck ForwardMsg is received", () => { | ||
| prepareHostCommunicationManager() | ||
|
|
||
| const connectionManager = getMockConnectionManager(true) | ||
|
|
||
| act(() => { | ||
| sendForwardMessage("heartbeatAck", true) | ||
| }) | ||
|
|
||
| expect(connectionManager.onHeartbeatAckReceived).toHaveBeenCalledTimes(1) | ||
| }) |
There was a problem hiding this comment.
The test should include a negative assertion to verify that onHeartbeatSent is NOT called when a heartbeatAck message is received. This would ensure that receiving an ack doesn't trigger the sending of a heartbeat, providing anti-regression coverage for the separation of concerns between these two operations.
| it("calls onHeartbeatSent when SEND_APP_HEARTBEAT is received", () => { | ||
| prepareHostCommunicationManager() | ||
|
|
||
| const connectionManager = getMockConnectionManager(true) | ||
|
|
||
| fireWindowPostMessage({ | ||
| type: "SEND_APP_HEARTBEAT", | ||
| }) | ||
|
|
||
| expect(connectionManager.onHeartbeatSent).toHaveBeenCalledTimes(1) | ||
| }) |
There was a problem hiding this comment.
The test should include a negative assertion to verify that onHeartbeatAckReceived is NOT called when a SEND_APP_HEARTBEAT message is received. This would ensure that the two different mechanisms (sending heartbeat vs. receiving ack) are properly separated and provide anti-regression coverage.
SummaryThis PR extends the existing Main changes:
Code QualityThe code is well-structured and follows existing patterns in the codebase. Strengths:
Minor observations:
Test CoverageTest coverage is comprehensive and follows best practices. Python tests (
Frontend tests:
E2E test infrastructure:
Backwards CompatibilityFully backwards compatible:
Security & RiskNo security concerns identified:
Risk assessment:
RecommendationsNo blocking issues found. The implementation is clean, well-tested, and follows established patterns. Optional consideration (non-blocking):
VerdictAPPROVED: This is a well-implemented feature with comprehensive test coverage, proper cleanup handling, and full backwards compatibility. The code follows existing patterns and best practices, and the PR description clearly explains the motivation and testing approach. This is an automated AI review using |
| this.awaitingHeartbeatAck = true | ||
|
|
||
| this.heartbeatAckTimeoutId = setTimeout(() => { | ||
| if (this.awaitingHeartbeatAck) { |
There was a problem hiding this comment.
issue: When we leave CONNECTED while a heartbeat is pending, this timeout can still fire later and log "connection may be unhealthy", which is misleading noise and keeps awaitingHeartbeatAck stale longer than necessary. I think we need to clear the awaitingHeartbeatAck state on certain transitions in setConnectionState.
| ).connectionState = ConnectionState.CONNECTED | ||
| }) | ||
|
|
||
| afterEach(() => { |
There was a problem hiding this comment.
suggestion: Ensure that we actually clean up everything in this afterEach hook:
afterEach(() => {
connectionManager.disconnect()
vi.clearAllTimers()
vi.clearAllMocks()
vi.useRealTimers()
})| * Timeout for heartbeat acknowledgment in milliseconds. | ||
| * If we send a heartbeat and don't receive an ack within this time, | ||
| * we consider the connection unhealthy. This time is set to be 1 second | ||
| * shorter than the heartbeat interval of 30s that we expect to be commonly |
There was a problem hiding this comment.
question: Is there any possibility of race conditions with this value?
| /** | ||
| * Tracks whether we're expecting a heartbeat ack from the server. | ||
| * This is set to true when a heartbeat is sent and false when an ack is | ||
| * received or when the timeout fires. | ||
| */ | ||
| private awaitingHeartbeatAck = false |
There was a problem hiding this comment.
question: Could we simplify the state by only tracking heartbeatAckTimeoutId? The existence of heartbeatAckTimeoutId should be enough to determine if we're awaiting or not, right?
Describe your changes
In hosted Streamlit app products like SiS and Community Cloud, we're currently very slow to detect when
a user's internet connection is running into problems. This is because we don't have a mechanism to quickly
detect when our websocket connection runs into trouble, and it's possible for minutes to pass after our
websocket connection actually drops before we finally see the network error dialog.
To make this process faster, we extend the existing
SEND_APP_HEARTBEATmechanism, which existsto allow the host of a Streamlit app request that the OSS client send a heartbeat to the server for the sole
purpose of keeping its websocket connection alive in environments where it'll otherwise time out. With these
new changes, we also have the server send an ack that it received this message back to the client. If no
ack is received, we assume that we may have a connection error and attempt to reconnect our websocket
connection, which will cause us to either
In the normal OSS world, we don't receive
SEND_APP_HEARTBEATmessages, so none of this new codeis ever invoked.
Testing Plan
hostframe.htmlto add a new button to send an app heartbeatapp_session.pyto simulate a network issue