Skip to content

feat(coderd): add chat debug service and summary aggregation#23916

Merged
ThomasK33 merged 3 commits into
mainfrom
devtools/04-service-summary-aggregation
Apr 17, 2026
Merged

feat(coderd): add chat debug service and summary aggregation#23916
ThomasK33 merged 3 commits into
mainfrom
devtools/04-service-summary-aggregation

Conversation

@ThomasK33

@ThomasK33 ThomasK33 commented Apr 1, 2026

Copy link
Copy Markdown
Member

Summary

Add the debug service that orchestrates run/step creation, updates, and finalization against the database, and the summary aggregation layer that computes first-message labels and run-level statistics from completed debug steps.

This is PR 4/9 in the chat debug logging stack.

Changes

  • Service (coderd/x/chatd/chatdebug/service.go): Service struct that owns enablement resolution (deployment → user → per-chat precedence), run/step CRUD against the database, stale-row finalization, and lifecycle cleanup for edit/archive paths.
  • Summary aggregation: computes step_count, first_message preview, has_error, and total token usage into the chat_debug_runs.summary JSONB column for cheap list rendering.
  • Tests (coderd/x/chatd/chatdebug/service_test.go): unit tests for enablement precedence, run lifecycle, and summary computation.

Stack overview

  1. Database schema & SDK types
  2. Types, context, and model normalization
  3. Recorder, transport, and redaction
  4. → Service and summary aggregation (this PR)
  5. Chat lifecycle wiring
  6. HTTP handlers and API docs
  7. Frontend API layer and panel utilities
  8. Debug panel components and settings
  9. Storybook stories

Generated with mux • Model: anthropic:claude-opus-4-6 • Thinking: xhigh

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 42c633cc7a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/x/chatd/chatdebug/summary.go Outdated
@ThomasK33 ThomasK33 force-pushed the devtools/04-service-summary-aggregation branch from 42c633c to acb43eb Compare April 1, 2026 14:59
@ThomasK33 ThomasK33 force-pushed the devtools/03-recorder-transport-redaction branch 2 times, most recently from 25352e2 to 83e32a4 Compare April 1, 2026 16:59
@ThomasK33 ThomasK33 force-pushed the devtools/04-service-summary-aggregation branch 2 times, most recently from 601aaa9 to 8eaae76 Compare April 1, 2026 17:02
@ThomasK33 ThomasK33 force-pushed the devtools/03-recorder-transport-redaction branch from 83e32a4 to 40cf3a2 Compare April 1, 2026 17:02
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8eaae76b56

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/x/chatd/chatdebug/summary.go Outdated
@ThomasK33 ThomasK33 force-pushed the devtools/03-recorder-transport-redaction branch from 40cf3a2 to 7a48e47 Compare April 1, 2026 17:37
@ThomasK33 ThomasK33 force-pushed the devtools/04-service-summary-aggregation branch 2 times, most recently from 46bbcd9 to 7e29de7 Compare April 1, 2026 17:39
@ThomasK33 ThomasK33 force-pushed the devtools/03-recorder-transport-redaction branch from 7a48e47 to 3bd356b Compare April 1, 2026 17:39
@ThomasK33 ThomasK33 force-pushed the devtools/04-service-summary-aggregation branch from ebcbba1 to df50d55 Compare April 1, 2026 18:01
@ThomasK33 ThomasK33 force-pushed the devtools/03-recorder-transport-redaction branch from 3bd356b to 0f95b7f Compare April 1, 2026 18:01
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

@ThomasK33 ThomasK33 force-pushed the devtools/03-recorder-transport-redaction branch from 0f95b7f to e0fa471 Compare April 1, 2026 18:31
@ThomasK33 ThomasK33 force-pushed the devtools/04-service-summary-aggregation branch from df50d55 to 0bd01b6 Compare April 1, 2026 18:31
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

@mafredri mafredri left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R33. Churn guard: PROCEED (F40, F41 addressed). Netero: no findings. Panel: 9 reviewers, 9 of 9 found nothing.

All 41 findings across 33 rounds are resolved. Zero open items. Netero's mechanical scan is clean. Nine independent reviewers confirmed the code is sound with no new findings. The sync.OnceFunc and max-builtin modernizations from F40/F41 are verified.

This is the third consecutive clean panel round (R28, R31, R33) with 9/9 reviewers finding nothing.

🤖 This review was automatically generated with Coder Agents.

@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f78b60c14d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/x/chatd/chatdebug/service.go Outdated
Comment thread coderd/x/chatd/chatdebug/service.go Outdated

@mafredri mafredri left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R34. Churn guard: PROCEED (F40, F41 addressed). Netero: no findings. Panel: 9 reviewers (v0.3.2 format), 8 of 9 found nothing.

All 41 prior findings resolved. The author self-initiated two improvements between rounds: FOR UPDATE on the CreateStep INSERT guard (closing the F27 TOCTOU gap permanently) and extending clock injection to INSERT paths (generalizing F39). Both are positive signals of continued quality after review pressure was removed.

Two minor P3s from Mafu-san. DEREM-1 is inline on service.go:346.

P3 [DEREM-2] The PR description (auto-generated by mux) claims "duration" in the summary JSONB column, but AggregateRunSummary does not compute a duration key. Either remove "and duration" from the description or implement it. (Mafu-san P3)

Quote of the round, from Mafu-san: "Both commits landed between the clean R33 review and the current round, showing continued work quality after review pressure was removed."

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/x/chatd/chatdebug/service.go Outdated
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: afd17907c7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/database/dbauthz/dbauthz.go Outdated
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2af01179fd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/database/queries/chatdebug.sql Outdated
Comment thread coderd/x/chatd/chatdebug/service.go
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mafredri mafredri left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R35: BLOCKED. Churn guard found 1 silent finding. No reviewers spawned.

DEREM-1 (comment accuracy) addressed: godoc now describes the CTE-based UPDATE locking mechanism.

DEREM-2 is silent: the PR description (auto-generated by mux) still claims "and duration" in the summary JSONB column, but AggregateRunSummary does not compute a duration key. This is a one-word edit to the PR description. Either remove "and duration" or add the field.

P3 [DEREM-2] PR description claims "duration" in summary but no duration key is computed. Silent since R34. (Mafu-san P3)

🤖 This review was automatically generated with Coder Agents.

@mafredri mafredri left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R36: BLOCKED. Churn guard found 1 silent finding. No reviewers spawned.

The PR description was edited to remove "and duration," which resolves the factual inaccuracy DEREM-2 identified. However, there was no response in the review thread acknowledging the finding. The block is procedural: the substance is fixed but the thread has no engagement.

P3 [DEREM-2] PR description edited to remove "duration" claim. A brief acknowledgment on the thread (e.g., "Fixed in description") would close this.

🤖 This review was automatically generated with Coder Agents.

@ThomasK33

Copy link
Copy Markdown
Member Author

Re: DEREM-2 (R35/R36) — PR description updated to remove the "and duration" claim. AggregateRunSummary does not compute a duration key; the description now accurately reflects what's written to the chat_debug_runs.summary JSONB column.

@ThomasK33

Copy link
Copy Markdown
Member Author

Re: DEREM-2 — PR description has been updated to drop the "and duration" claim; the summary JSONB column holds step_count, first_message, has_error, and total token usage (no duration key computed by AggregateRunSummary). Thanks for the catch.

@mafredri mafredri left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R37. Churn guard: PROCEED (DEREM-2 addressed, author confirmed description edit). Netero: 1 P3 (DEREM-3 dead query). Panel: 9 reviewers, 7 of 9 found nothing.

All 43 prior findings resolved. DEREM-2 (duration in description) closed after the author confirmed the edit.

Three new findings, all P3 test coverage gaps:

P3 [DEREM-4] No test for malformed step usage JSON in AggregateRunSummary. The skip-and-log path when json.Unmarshal fails on step.Usage has zero coverage. (Bisky P3, summary_test.go file-level)

DEREM-3 and DEREM-5 are inline on their respective files.

Quote of the round, from Bisky: "Twenty-one AggregateRunSummary subtests for an 80-line function is thorough. The gem is genuine."

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/database/queries/chatdebug.sql Outdated
Comment thread coderd/x/chatd/chatdebug/model.go
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a7baa1a36

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/x/chatd/chatdebug/service.go

@mafredri mafredri left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R38. Churn guard: PROCEED (DEREM-3, DEREM-4, DEREM-5 addressed). Netero: no findings. Panel: 9 reviewers, 9 of 9 found nothing.

All 46 findings across 38 rounds are resolved. Zero open items. Netero clean across all 8 sections. Nine independent reviewers confirmed the code is sound.

The DEREM-3 fix (SQL comment corrected to reflect test-only purpose), DEREM-4 fix (MalformedUsageJSONSkipped test added), and DEREM-5 fix (threshold-change wakeup test with quartz trap) are all verified.

This is the fourth consecutive clean panel round (R28, R33, R34, R38) with 9/9 reviewers finding nothing.

🤖 This review was automatically generated with Coder Agents.

@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ba3c95ccc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/database/queries/chatdebug.sql
Comment thread coderd/database/queries/chatdebug.sql Outdated
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

@mafredri mafredri left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R39. Churn guard: PROCEED (0 open). Netero: no findings. Panel: 9 reviewers, 9 of 9 found nothing.

All 46 findings across 39 rounds are resolved. Zero open items. This is the fifth consecutive clean panel round (R28, R33, R34, R38, R39) with 9/9 reviewers finding nothing. CI is green (15 passed, 8 skipped).

The PR is thoroughly reviewed and ready.

🤖 This review was automatically generated with Coder Agents.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7ce62e5da6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/x/chatd/chatdebug/service.go Outdated
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mafredri mafredri left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R40. Churn guard: PROCEED (0 open). Netero: no findings. Panel: 9 reviewers, 9 of 9 found nothing.

All 46 findings across 40 rounds are resolved. Zero open items. Sixth consecutive clean panel round. CI pending.

The new code since R39 (114 lines: SQL, service.go, service_test.go) adds classifyMissingRun for better error disambiguation and FinishedAtIsWriteOnce test for idempotent terminal timestamps. No reviewer found issues.

🤖 This review was automatically generated with Coder Agents.

Change-Id: I5b12df4d029dd8eb8b39e1fa6c8f86476a05ecbd
Signed-off-by: Thomas Kosiewski <tk@coder.com>
TouchChatDebugStepAndRun defined touched_run as a data-modifying CTE but never referenced it in the main UPDATE. Per PostgreSQL semantics, RETURNING is the only way to communicate between a data-modifying CTE and its main query, and without that link the two updates run in unpredictable order. Concurrent heartbeats and FinalizeStaleChatDebugRows could therefore acquire the run and step row locks in opposite orders and deadlock.

Add RETURNING id, chat_id to touched_run and change the step UPDATE to FROM touched_run joined on run_id (and chat_id). Consuming the RETURNING rows forces the run UPDATE to produce its output before the step UPDATE can process it, matching the lock order used by FinalizeStaleChatDebugRows. The join also constrains the step update to the run identified by run_id so a mismatched (run_id, step_id) pair cannot silently refresh an unrelated step.

Change-Id: If6d9240f0ad31a40b5f62b521d467ff73313ae7c
Signed-off-by: Thomas Kosiewski <tk@coder.com>
…ized

InsertChatDebugStep's locked_run CTE returns zero rows when the run is missing, when chat_id does not match, or when finished_at is already set. CreateStep conflated all three into errRunFinalized, masking caller-side data bugs (unknown run_id, mismatched chat_id) as a benign terminal state so delayed step writes against an invalid parent were silently dropped.

Add an errRunNotFound sentinel and a classifyMissingRun helper that looks the run up by id after an ErrNoRows from the insert. If the run is absent or the chat_id does not match, CreateStep now surfaces errRunNotFound; the legitimate terminal case still returns errRunFinalized. A run found open under the caller's chat_id (which should not normally produce ErrNoRows) now surfaces a distinct error instead of being masked.

Add TestService_CreateStep_MissingRunReportsNotFound and TestService_CreateStep_ChatIDMismatchReportsNotFound to lock in the new classification.

Change-Id: I215605b3475218f8c13fc1a4f85d10de50926cf1
Signed-off-by: Thomas Kosiewski <tk@coder.com>

@mafredri mafredri left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R41. Churn guard: PROCEED (0 open). Netero: no findings. Panel: 9 reviewers, 9 of 9 found nothing.

All 46 findings across 41 rounds are resolved. Zero open items. Seventh consecutive clean panel round. CI green (15 passed, 8 skipped). Zero chatdebug changes since R40 (base rebase only).

🤖 This review was automatically generated with Coder Agents.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants