feat(coderd): add chat debug service and summary aggregation#23916
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 42c633cc7a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
42c633c to
acb43eb
Compare
25352e2 to
83e32a4
Compare
601aaa9 to
8eaae76
Compare
83e32a4 to
40cf3a2
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8eaae76b56
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
40cf3a2 to
7a48e47
Compare
46bbcd9 to
7e29de7
Compare
7a48e47 to
3bd356b
Compare
ebcbba1 to
df50d55
Compare
3bd356b to
0f95b7f
Compare
|
@codex review |
0f95b7f to
e0fa471
Compare
df50d55 to
0bd01b6
Compare
|
@codex review |
mafredri
left a comment
There was a problem hiding this comment.
R33. Churn guard: PROCEED (F40, F41 addressed). Netero: no findings. Panel: 9 reviewers, 9 of 9 found nothing.
All 41 findings across 33 rounds are resolved. Zero open items. Netero's mechanical scan is clean. Nine independent reviewers confirmed the code is sound with no new findings. The sync.OnceFunc and max-builtin modernizations from F40/F41 are verified.
This is the third consecutive clean panel round (R28, R31, R33) with 9/9 reviewers finding nothing.
🤖 This review was automatically generated with Coder Agents.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f78b60c14d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
mafredri
left a comment
There was a problem hiding this comment.
R34. Churn guard: PROCEED (F40, F41 addressed). Netero: no findings. Panel: 9 reviewers (v0.3.2 format), 8 of 9 found nothing.
All 41 prior findings resolved. The author self-initiated two improvements between rounds: FOR UPDATE on the CreateStep INSERT guard (closing the F27 TOCTOU gap permanently) and extending clock injection to INSERT paths (generalizing F39). Both are positive signals of continued quality after review pressure was removed.
Two minor P3s from Mafu-san. DEREM-1 is inline on service.go:346.
P3 [DEREM-2] The PR description (auto-generated by mux) claims "duration" in the summary JSONB column, but AggregateRunSummary does not compute a duration key. Either remove "and duration" from the description or implement it. (Mafu-san P3)
Quote of the round, from Mafu-san: "Both commits landed between the clean R33 review and the current round, showing continued work quality after review pressure was removed."
🤖 This review was automatically generated with Coder Agents.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: afd17907c7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2af01179fd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
mafredri
left a comment
There was a problem hiding this comment.
R35: BLOCKED. Churn guard found 1 silent finding. No reviewers spawned.
DEREM-1 (comment accuracy) addressed: godoc now describes the CTE-based UPDATE locking mechanism.
DEREM-2 is silent: the PR description (auto-generated by mux) still claims "and duration" in the summary JSONB column, but AggregateRunSummary does not compute a duration key. This is a one-word edit to the PR description. Either remove "and duration" or add the field.
P3 [DEREM-2] PR description claims "duration" in summary but no duration key is computed. Silent since R34. (Mafu-san P3)
🤖 This review was automatically generated with Coder Agents.
mafredri
left a comment
There was a problem hiding this comment.
R36: BLOCKED. Churn guard found 1 silent finding. No reviewers spawned.
The PR description was edited to remove "and duration," which resolves the factual inaccuracy DEREM-2 identified. However, there was no response in the review thread acknowledging the finding. The block is procedural: the substance is fixed but the thread has no engagement.
P3 [DEREM-2] PR description edited to remove "duration" claim. A brief acknowledgment on the thread (e.g., "Fixed in description") would close this.
🤖 This review was automatically generated with Coder Agents.
|
Re: DEREM-2 (R35/R36) — PR description updated to remove the "and duration" claim. |
|
Re: DEREM-2 — PR description has been updated to drop the "and duration" claim; the summary JSONB column holds |
mafredri
left a comment
There was a problem hiding this comment.
R37. Churn guard: PROCEED (DEREM-2 addressed, author confirmed description edit). Netero: 1 P3 (DEREM-3 dead query). Panel: 9 reviewers, 7 of 9 found nothing.
All 43 prior findings resolved. DEREM-2 (duration in description) closed after the author confirmed the edit.
Three new findings, all P3 test coverage gaps:
P3 [DEREM-4] No test for malformed step usage JSON in AggregateRunSummary. The skip-and-log path when json.Unmarshal fails on step.Usage has zero coverage. (Bisky P3, summary_test.go file-level)
DEREM-3 and DEREM-5 are inline on their respective files.
Quote of the round, from Bisky: "Twenty-one AggregateRunSummary subtests for an 80-line function is thorough. The gem is genuine."
🤖 This review was automatically generated with Coder Agents.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3a7baa1a36
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
mafredri
left a comment
There was a problem hiding this comment.
R38. Churn guard: PROCEED (DEREM-3, DEREM-4, DEREM-5 addressed). Netero: no findings. Panel: 9 reviewers, 9 of 9 found nothing.
All 46 findings across 38 rounds are resolved. Zero open items. Netero clean across all 8 sections. Nine independent reviewers confirmed the code is sound.
The DEREM-3 fix (SQL comment corrected to reflect test-only purpose), DEREM-4 fix (MalformedUsageJSONSkipped test added), and DEREM-5 fix (threshold-change wakeup test with quartz trap) are all verified.
This is the fourth consecutive clean panel round (R28, R33, R34, R38) with 9/9 reviewers finding nothing.
🤖 This review was automatically generated with Coder Agents.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3ba3c95ccc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
mafredri
left a comment
There was a problem hiding this comment.
R39. Churn guard: PROCEED (0 open). Netero: no findings. Panel: 9 reviewers, 9 of 9 found nothing.
All 46 findings across 39 rounds are resolved. Zero open items. This is the fifth consecutive clean panel round (R28, R33, R34, R38, R39) with 9/9 reviewers finding nothing. CI is green (15 passed, 8 skipped).
The PR is thoroughly reviewed and ready.
🤖 This review was automatically generated with Coder Agents.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7ce62e5da6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
mafredri
left a comment
There was a problem hiding this comment.
R40. Churn guard: PROCEED (0 open). Netero: no findings. Panel: 9 reviewers, 9 of 9 found nothing.
All 46 findings across 40 rounds are resolved. Zero open items. Sixth consecutive clean panel round. CI pending.
The new code since R39 (114 lines: SQL, service.go, service_test.go) adds classifyMissingRun for better error disambiguation and FinishedAtIsWriteOnce test for idempotent terminal timestamps. No reviewer found issues.
🤖 This review was automatically generated with Coder Agents.
Change-Id: I5b12df4d029dd8eb8b39e1fa6c8f86476a05ecbd Signed-off-by: Thomas Kosiewski <tk@coder.com>
TouchChatDebugStepAndRun defined touched_run as a data-modifying CTE but never referenced it in the main UPDATE. Per PostgreSQL semantics, RETURNING is the only way to communicate between a data-modifying CTE and its main query, and without that link the two updates run in unpredictable order. Concurrent heartbeats and FinalizeStaleChatDebugRows could therefore acquire the run and step row locks in opposite orders and deadlock. Add RETURNING id, chat_id to touched_run and change the step UPDATE to FROM touched_run joined on run_id (and chat_id). Consuming the RETURNING rows forces the run UPDATE to produce its output before the step UPDATE can process it, matching the lock order used by FinalizeStaleChatDebugRows. The join also constrains the step update to the run identified by run_id so a mismatched (run_id, step_id) pair cannot silently refresh an unrelated step. Change-Id: If6d9240f0ad31a40b5f62b521d467ff73313ae7c Signed-off-by: Thomas Kosiewski <tk@coder.com>
…ized InsertChatDebugStep's locked_run CTE returns zero rows when the run is missing, when chat_id does not match, or when finished_at is already set. CreateStep conflated all three into errRunFinalized, masking caller-side data bugs (unknown run_id, mismatched chat_id) as a benign terminal state so delayed step writes against an invalid parent were silently dropped. Add an errRunNotFound sentinel and a classifyMissingRun helper that looks the run up by id after an ErrNoRows from the insert. If the run is absent or the chat_id does not match, CreateStep now surfaces errRunNotFound; the legitimate terminal case still returns errRunFinalized. A run found open under the caller's chat_id (which should not normally produce ErrNoRows) now surfaces a distinct error instead of being masked. Add TestService_CreateStep_MissingRunReportsNotFound and TestService_CreateStep_ChatIDMismatchReportsNotFound to lock in the new classification. Change-Id: I215605b3475218f8c13fc1a4f85d10de50926cf1 Signed-off-by: Thomas Kosiewski <tk@coder.com>
mafredri
left a comment
There was a problem hiding this comment.
R41. Churn guard: PROCEED (0 open). Netero: no findings. Panel: 9 reviewers, 9 of 9 found nothing.
All 46 findings across 41 rounds are resolved. Zero open items. Seventh consecutive clean panel round. CI green (15 passed, 8 skipped). Zero chatdebug changes since R40 (base rebase only).
🤖 This review was automatically generated with Coder Agents.

Summary
Add the debug service that orchestrates run/step creation, updates, and finalization against the database, and the summary aggregation layer that computes first-message labels and run-level statistics from completed debug steps.
This is PR 4/9 in the chat debug logging stack.
Changes
coderd/x/chatd/chatdebug/service.go):Servicestruct that owns enablement resolution (deployment → user → per-chat precedence), run/step CRUD against the database, stale-row finalization, and lifecycle cleanup for edit/archive paths.step_count,first_messagepreview,has_error, and total token usage into thechat_debug_runs.summaryJSONB column for cheap list rendering.coderd/x/chatd/chatdebug/service_test.go): unit tests for enablement precedence, run lifecycle, and summary computation.Stack overview
Generated with
mux• Model:anthropic:claude-opus-4-6• Thinking:xhigh