Skip to content

feat: add batched agent heartbeat writes#23699

Draft
sreya wants to merge 8 commits intomainfrom
heartbeat-batcher
Draft

feat: add batched agent heartbeat writes#23699
sreya wants to merge 8 commits intomainfrom
heartbeat-batcher

Conversation

@sreya
Copy link
Copy Markdown
Collaborator

@sreya sreya commented Mar 26, 2026

Reduces agent connection heartbeat DB writes from ~667/s to ~2/s at 10k agents by batching UpdateWorkspaceAgentConnectionByID calls. Disconnect writes bypass the batcher for immediate visibility.

@sreya sreya force-pushed the heartbeat-batcher branch 3 times, most recently from a3e7160 to 36ffd0e Compare March 26, 2026 23:03
Reduces agent connection heartbeat DB writes from ~667/s to ~2/s at
10k agents by batching UpdateWorkspaceAgentConnectionByID calls.
Disconnect writes bypass the batcher for immediate visibility.
@sreya sreya force-pushed the heartbeat-batcher branch from 36ffd0e to 44ea26c Compare March 26, 2026 23:06
sreya added 6 commits March 26, 2026 23:38
Rename files:
- batcher.go -> heartbeats.go
- batcher_test.go -> heartbeats_internal_test.go
- batcher_db_test.go -> heartbeats_test.go

Prefix exported symbols with Heartbeat to avoid collisions with
existing agentapi package exports.
Match the pattern used by BatchUpdateWorkspaceAgentMetadata: require
ActionUpdate on ResourceWorkspace.All() rather than bypassing auth.

Also fix unused-receiver lint warning in test.
Verify the batch query works for n > 1 agents in a single flush.
Use a unique later2 timestamp for agent2 to verify each agent's
values are committed independently. Switch from WithinDuration to
Equal for exact assertions.
Add an updated_at guard to the BatchUpdateWorkspaceAgentConnections
SQL query so that a batch flush never overwrites a newer value that
was written directly (or vice versa) when the channel-full fallback
is triggered.

Also switch the DB integration test to use dbtime.Now() for
timestamps so they are always after the agent's creation
updated_at.
@sreya
Copy link
Copy Markdown
Collaborator Author

sreya commented Mar 27, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 88943169f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Use NULLIF in the SQL SET clause to convert Go zero-value sentinels
(epoch for timestamptz, nil UUID) back to NULL. Without this, an
invalid sql.NullTime for disconnected_at would be written as the
epoch (year 0001), corrupting connection metadata for agents that
have never disconnected.

Add test assertions verifying disconnected_at remains NULL when the
update does not set it.
@sreya
Copy link
Copy Markdown
Collaborator Author

sreya commented Mar 27, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant