perf(coderd): reduce chat streaming latency with event-driven acquisition#23745
Merged
perf(coderd): reduce chat streaming latency with event-driven acquisition#23745
Conversation
…tion Previously, when a user sent a message, there was a 0-1000ms (avg 500ms) polling delay before processing began. SendMessage/CreateChat/EditMessage set status='pending' in the DB and returned, but nothing woke the processing loop — it was a blind 1-second ticker. This change eliminates the polling delay by adding a wake channel that signals the run loop to call processOnce immediately. The ticker remains as a fallback safety net. Additional latency reductions: - Buffer the WebSocket write channel (64) in OneWayWebSocketEventSender to decouple producers from write speed. - Reduce chat stream batch size from 256 to 32 so the client gets smaller JSON payloads to parse sooner.
…g races CreateChat now signals the wake channel immediately, causing the chatd daemon to process chats before TestListChats can assert ordering. Add require.Eventually loops to wait for chats to reach terminal status before asserting list ordering and pagination.
…learsQueue EditMessage calls signalWake() after committing, causing the chatd daemon to process the chat immediately. The DB assertion for pending status raced with this processing. Use require.Eventually + WaitUntilIdleForTest to wait for the processing to settle before asserting.
ibetitsmike
approved these changes
Mar 28, 2026
…iaPubsub CreateChat calls signalWake(), causing the daemon to process chats immediately. The race detector exposed a race where both parent and child chats were processed (and errored) before the test could set up the pubsub scenario. Wait for inflight processing to settle and reset chat statuses before proceeding with the test.
- Move WaitUntilIdleForTest out of require.Eventually goroutine to avoid WaitGroup Add/Wait race in TestEditMessage test - Remove stale creation-time UpdatedAt ordering check in TestListChats/Success — the descending-sort invariant is already verified by the adjacent loop
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously, when a user sent a message, there was a 0–1000ms (avg ~500ms) polling delay before processing began.
SendMessage/CreateChat/EditMessagesetstatus='pending'in the DB and returned, but nothing woke the processing loop — it was a blind 1-second ticker.Changes
Event-driven acquisition (main change): Adds a
wakeChchannel to the chatdServer.CreateChat,SendMessage,EditMessage, andPromoteQueuedcallsignalWake()after committing their transactions, which wakes the run loop to callprocessOnceimmediately. The 1-second ticker remains as a fallback safety net for edge cases (stale recovery, missed signals).Buffer WebSocket write channel: Changes the
OneWayWebSocketEventSenderevent channel from unbuffered to buffered (64), decoupling the event producer from WebSocket write speed. The existing 10s write timeout guards against stuck connections.Implementation plan & analysis
The full latency analysis identified these sources of delay in the streaming pipeline:
FOR UPDATElock + batch insert. Not addressed in this PR (medium risk, would overlap DB write with next provider TTFB).Test stabilization notes
signalWake()causes the chatd daemon to process chats immediately after creation/send/edit, which exposed timing assumptions in several tests that expected chats to remain inpendingstatus long enough to assert on. These tests were updated withrequire.Eventually+WaitUntilIdleForTestpatterns to wait for processing to settle before asserting.The race detector (
test-go-race-pg) shows failures inTestCreateWorkspaceTool_EndToEndandTestAwaitSubagentCompletion— these appear to be pre-existing races in the end-to-end chat flow that are now exercised more aggressively because processing starts immediately instead of after a 1s delay. Main branch CI (race detector) passes without these changes.