feat: add chat debug log tables, queries, and SDK types#23913
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e1d1919623
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
a0089c6 to
7aba550
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7aba55075e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
2dcf606 to
9870581
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9870581eab
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
9870581 to
001ff08
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 001ff08620
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
001ff08 to
af3df62
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: af3df62819
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
af3df62 to
be6a736
Compare
|
@codex review |
1 similar comment
|
@codex review |
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review |
1 similar comment
|
@codex review |
|
Codex Review: Didn't find any major issues. 👍 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7bfdce2bc7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
|
Codex Review: Didn't find any major issues. Delightful! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
mafredri
left a comment
There was a problem hiding this comment.
Round 13 (Churn Guard + Netero + full 8-reviewer panel). All 4 prior findings closed: status typed constants connected to struct fields, integration test added for FinalizeStaleChatDebugRows, query renamed to ByChatID, ChatDebugRun forward-declaration accepted with doc comment and consuming PR references.
Netero first pass: zero findings. Mechanical floor clean.
Severity tally: 0 P0, 0 P1, 0 P2, 3 P3 (new). This PR has come a long way from 5 P2s in R1.
The P3s cluster around naming ambiguity in the three-tier debug logging hierarchy: GetChatDebugLoggingEnabled means "admin allows user opt-in" but reads as "debug logging is enabled" -- especially confusing alongside the deployment flag CODER_CHAT_DEBUG_LOGGING_ENABLED which does force-enable. The SQL key agents_chat_debug_logging_allow_users already names the concept correctly. Since this is PR 1/9 with no API consumers yet, renaming is free.
Knuckle and Kurapika found no issues. Ging-Go found no issues. Authorization audit is clean across all 15 dbauthz methods. Schema is safe (nullable ADD COLUMN, no locks on existing tables, correct CASCADE topology).
"These conversion tests are genuine. They construct specific inputs, convert, and assert every output field. Real gems." -- Bisky
🤖 This review was automatically generated with Coder Agents.
mafredri
left a comment
There was a problem hiding this comment.
Round 14. Churn Guard: BLOCKED. 2 of 3 findings from R13 are silent (no code change, no author response). No reviewers spawned.
Addressed: N1 (admin settings naming) -- renamed to GetChatDebugLoggingAllowUsers / AllowUsers. Good.
Silent (needs response or fix):
-
chatdebug.sql:178 -- Comment says "remain in 'running' state permanently" but the debug status is
in_progress, notrunning.runningis aChatStatusvalue, notChatDebugStatus. One-word fix: s/running/in_progress/. -
TestFinalizeStaleChatDebugRows -- Only tests
'completed'as a terminal exclusion.'error'is untested. Remove'error'from the SQL NOT IN clause and the test still passes. Add a run withstatus = 'error'and verify it is not finalized.
Further review is blocked until the author responds to or fixes these two items.
🤖 This review was automatically generated with Coder Agents.
mafredri
left a comment
There was a problem hiding this comment.
Round 15. Churn Guard: PROCEED (both R14 blockers fixed -- comment corrected, 'error' test case added). Netero: zero findings. 4-reviewer panel (Mafuuu, Bisky, Knuckle + Chopper wildcard).
Severity tally: 0 P0, 0 P1, 1 P2, 2 P3.
Knuckle's schema review is comprehensive and clean: every index matches a query, CASCADE topology is correct, Go-SQL boundary types are consistent, authorization is sound across all 15 dbauthz methods. Mafuuu verified all prior findings resolved.
P2 (non-diff, querier_test.go) The InsertChatDebugStep INSERT...SELECT guard enforces that a step's chat_id matches its parent run's chat_id at the SQL level. The dbauthz comment documents this as defense-in-depth. No integration test proves it works. Feed InsertChatDebugStep a valid run_id belonging to chat A with chat_id set to chat B. Assert sql.ErrNoRows. Same class of gap applies to UpdateChatDebugRun and UpdateChatDebugStep which both have AND chat_id = @chat_id::uuid in their WHERE clauses. If a future migration drops the composite FK or changes the INSERT to a plain INSERT, nothing in the test suite catches the regression. (Bisky P2)
P3 (non-diff, db2sdk.go:1657) rawJSONObject returns {"error": "malformed debug payload", "raw": ...} on unmarshal failure. The sibling chatDebugAttempts has a test. rawJSONObject has six callsites and zero test coverage for its error branch. Feed json.RawMessage("not-an-object") through ChatDebugRunSummary.Summary and assert the error sentinel. (Bisky P3)
P3 (non-diff, querier_test.go:11436) TestFinalizeStaleChatDebugRows doesn't isolate the cascade CTE clause. The 24h threshold catches everything by age, so removing OR run_id IN (SELECT id FROM finalized_runs) doesn't break the test. To isolate: create a step whose updated_at is newer than the threshold but whose run IS stale. (Chopper P3)
"The querier tests are real. Both tests would break if the SQL logic were wrong. Real gems." -- Bisky
🤖 This review was automatically generated with Coder Agents.
mafredri
left a comment
There was a problem hiding this comment.
Round 16. Churn Guard: PROCEED (all 3 R15 findings addressed with code: cross-chat SQL guards tested, rawJSONObject malformed path tested, cascade CTE isolated). Netero: 1 P3 (em-dash). 4-reviewer panel (Mafuuu, Bisky, Knuckle + Razor wildcard).
Severity tally: 0 P0, 0 P1, 0 P2, 2 P3.
Mafuuu, Knuckle, and Razor found zero new issues. Mafuuu applied all four review modes (contract tracing, lifecycle completeness, semantic honesty, adversarial imagination) and confirmed everything is clean. Knuckle verified schema, indexes, CASCADE topology, Go-SQL boundary, and authorization -- no findings.
Two mechanical P3s:
-
Em-dash (U+2014) at
querier_test.go:11662in// updated_at --- proving the cascade CTE clause is required.Replace with--or rephrase. (Netero P3) -
TestChatDebugStepcomment says "Verify all scalar fields" but skipsUpdatedAt. The field is set in the fixture but never asserted. Addrequire.Equal(t, finishedAt, sdk.UpdatedAt). (Bisky P3)
"Three integration tests, seven converter tests, fourteen dbauthz tests. The cascade step isolates the
run_id IN (SELECT id FROM finalized_runs)clause from the age predicate. Without that clause the step survives. Real." -- Bisky
🤖 This review was automatically generated with Coder Agents.
|
@codex review |
mafredri
left a comment
There was a problem hiding this comment.
Round 17. Churn Guard: PROCEED (both R16 P3s addressed -- em-dash replaced, UpdatedAt assertion added). Netero: zero findings. 4-reviewer panel (Mafuuu, Bisky, Knuckle + Chopper wildcard).
Severity tally: 0 P0, 0 P1, 0 P2, 2 P3.
Mafuuu and Knuckle both found zero issues. Mafuuu applied all four review modes and verified every invariant. Knuckle confirmed schema, indexes, CASCADE topology, Go-SQL boundary, and authorization are all clean.
Two P3 test coverage suggestions remain:
-
COALESCE partial-update preservation (chatdebug.sql:44/118): The write-once-finalize COALESCE pattern is documented and acknowledged, but no test inserts a fully-populated run, updates only Status+FinishedAt, and asserts every other field survived. If COALESCE were removed from any column, fields would silently null out with no test failure. (Bisky P2, downgraded to P3 -- invariant is acknowledged and documented, gap is regression protection)
-
NULL message ID survival (chatdebug.sql:151):
DeleteChatDebugDataAfterMessageIDcorrectly skips rows with NULL message columns (SQL three-valued logic), but every test case usesValid: true. A case with all-NULL message IDs asserting the run survives would lock in the invariant. (Chopper P3)
Additionally noted: DeleteChatDebugDataByChatID and the siteconfig/user_config queries have no integration tests, consistent with codebase patterns for simple queries.
"These tests prove behavior, not just execution." -- Bisky
🤖 This review was automatically generated with Coder Agents.
|
Codex Review: Didn't find any major issues. 👍 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
mafredri
left a comment
There was a problem hiding this comment.
Round 18. Churn Guard: BLOCKED. 2 of 4 R17 findings addressed (COALESCE preservation test added, NULL message ID survival test added -- both good). 2 findings are silent:
-
DeleteChatDebugDataByChatID has no integration test. The dbauthz mock test exists. No real-DB test confirms the DELETE cascades to steps and doesn't touch other chats. Noted in R17 body as consistent with codebase patterns -- a brief response acknowledging or a simple two-chat test would close this.
-
Siteconfig/user_configs debug logging queries (
GetChatDebugLoggingAllowUsers,UpsertChatDebugLoggingAllowUsers,GetUserChatDebugLoggingEnabled,UpsertUserChatDebugLoggingEnabled) have no real-DB tests. Also noted in R17 body as consistent with codebase patterns (GetChatDesktopEnabledhas none either). A brief response acknowledging would close this.
Both are P3 notes consistent with existing patterns, not correctness issues. A response of "consistent with codebase pattern, not adding" would close both.
Further review is blocked until the author responds to or fixes these two items.
🤖 This review was automatically generated with Coder Agents.
|
Addressing the two outstanding R18 findings:
|
mafredri
left a comment
There was a problem hiding this comment.
Round 19. Churn Guard: BLOCKED (second consecutive). Same 2 silent findings from R17/R18, no code change, no author response.
- DeleteChatDebugDataByChatID -- no integration test. Silent 2 rounds.
- Siteconfig/user_configs debug logging queries -- no real-DB tests. Silent 2 rounds.
Both are P3 notes consistent with existing codebase patterns. A one-line response ("consistent with codebase pattern, not adding" or "tracked in #24137") would unblock.
🤖 This review was automatically generated with Coder Agents.
mafredri
left a comment
There was a problem hiding this comment.
Round 20. Churn Guard: PROCEED (both contested findings accepted -- codebase precedent for simple queries holds). Netero: 1 nit. 4-reviewer panel (Mafuuu, Bisky, Knuckle + Meruem wildcard).
Severity tally: 0 P0, 0 P1, 0 P2, 1 P3, 1 Nit.
Knuckle found zero issues after a thorough schema, query, Go-SQL boundary, and authorization review. All 4 reviewers accepted the contested findings. The PR is clean.
One new P3: TestChatDebugRunCOALESCEPreservation exists for runs but there is no equivalent for steps. UpdateChatDebugStep has 9 COALESCE columns. If a COALESCE is removed from any step column, the field silently nulls out on partial update with no test failure. Write TestChatDebugStepCOALESCEPreservation mirroring the run test. (Bisky P3)
Mafuuu and Meruem both noted the terminal status TEXT-without-CHECK constraint as ongoing accepted debt. Meruem provided a concrete fix: status TEXT NOT NULL CHECK (status IN ('in_progress', 'completed', 'error', 'interrupted')). This has been raised since R1 and the author has chosen Go typed constants without a DB constraint. The consequence is bounded (adding a new terminal status requires updating two SQL NOT IN lists), but the CHECK constraint would eliminate the class of bug.
Nit querier_test.go:11905: double-hyphen used as em-dash in prose. Replace with a period or semicolon. (Netero)
"These tests prove behavior, not just execution." -- Bisky
🤖 This review was automatically generated with Coder Agents.
mafredri
left a comment
There was a problem hiding this comment.
Round 21. Churn Guard: PROCEED (both R20 findings addressed -- step COALESCE test added, double-hyphen fixed). Netero: zero findings. 4-reviewer panel (Mafuuu, Bisky, Knuckle + Razor wildcard).
Severity tally: 0 P0, 0 P1, 0 P2, 1 P3, 2 P4.
Knuckle delivered a comprehensive schema, query, Go-SQL boundary, and authorization review with zero findings. Razor verified every documented claim in the PR (CTE snapshot behavior, INSERT...SELECT ownership guard, COALESCE write-once semantics, NULL message ID survival, malformed payload preservation) against the code and tests. All claims hold.
One P3: the admin debug-logging request type reuses debug_logging_enabled where the response uses allow_users. The asymmetry is confusing for the admin use case. Two P4 notes: query ordering untested, terminal status literals accepted as documented debt.
"279 assertions across 1266 new test lines. The cascade-only step that can't survive without the CTE clause -- that's a beautiful setup. Real gems." -- Bisky
🤖 This review was automatically generated with Coder Agents.
mafredri
left a comment
There was a problem hiding this comment.
Round 22. Churn Guard: PROCEED (P3 addressed with request type split, both P4s acknowledged). Netero: zero findings. 4-reviewer panel (Mafuuu, Bisky, Knuckle + Chopper wildcard).
Severity tally: 0 P0, 0 P1, 0 P2, 2 P3, 1 P4, 1 Nit.
Mafuuu found zero issues after applying all four review modes. Bisky praised the test suite extensively: "The cascade-only step that can't survive without the CTE clause -- that's a beautiful setup. If they prune it as 'redundant,' the test breaks." Knuckle closed with: "Keeping this schema clean through 8 more PRs is harder than getting it clean in the first place. Good work."
Two P3s from fresh eyes:
-
Chopper:
rawJSONObject/chatDebugAttemptspreserve the raw bytes on parse failure but discard thejson.Unmarshalerror itself. For a debug tool, the parse error is the most diagnostic piece. -
Knuckle: the terminal status NOT IN list should have a Go test asserting alignment with the
ChatDebugStatusconstants. This is a new, actionable suggestion (a test, not a migration) for the long-running status-sync concern.
"This is how you test a partial-update pattern. Genuine and well-designed." -- Bisky
🤖 This review was automatically generated with Coder Agents.
mafredri
left a comment
There was a problem hiding this comment.
Round 23. Churn Guard: PROCEED (both R22 findings addressed -- parse_error key added, ChatDebugTerminalStatuses() function added). Netero: zero findings. 4-reviewer panel (Mafuuu, Bisky, Knuckle + Kite wildcard).
Severity tally: 0 P0, 0 P1, 0 P2, 1 P3, 1 Nit.
Knuckle and Kite both found zero issues. Mafuuu found one novel P3: rawJSONObject returns nil for JSON literal "null" (the 4-byte string, not SQL NULL or empty bytes). The DB column is JSONB NOT NULL DEFAULT '{}', but NOT NULL only rejects SQL NULL -- it accepts JSONB null. A Go caller passing json.RawMessage("null") bypasses the INSERT COALESCE, stores null::jsonb, and the converter returns nil for a non-optional SDK field. TypeScript crashes on run.summary.someKey.
Bisky re-raised DeleteChatDebugDataByChatID integration test coverage, which was closed by unanimous panel consensus in R20 ("consistent with codebase patterns"). Suppressed as a re-raise of a closed finding.
"945 new lines of test code, real Postgres integration, specific assertions. These are real gems." -- Bisky
🤖 This review was automatically generated with Coder Agents.
mafredri
left a comment
There was a problem hiding this comment.
Round 24. Churn Guard: PROCEED (R23 P3 addressed with nil guard, nit addressed with migration comment). Netero: zero findings. 4-reviewer panel (Mafuuu, Bisky, Knuckle + Kite wildcard).
Severity tally: 0 P0, 0 P1, 0 P2, 2 P3.
Mafuuu, Knuckle, and Kite all found zero issues. Knuckle: "The schema, queries, Go-SQL boundary, and authorization layer are clean. Every prior finding in the inventory has been addressed or accepted. I found no new debt to lend against."
Two novel P3s from Bisky, both about the nil guards added in R24:
-
The JSON-literal-null guards in
rawJSONObject(line 1680) andchatDebugAttempts(line 1654) have no test feedingjson.RawMessage("null"). If the guards are removed, API responses silently change from{}tonull. -
TestFinalizeStaleChatDebugRowshas no fresh in-progress run to prove the age predicate protects active sessions. All in-progress runs are stale. Ifupdated_at < @updated_beforeis removed from the runs CTE, the test still passes.
Bisky also re-raised settings queries integration test coverage (closed by panel consensus R20). Suppressed.
"Almost all of it is real. Almost." -- Bisky
🤖 This review was automatically generated with Coder Agents.
mafredri
left a comment
There was a problem hiding this comment.
Round 25. Churn Guard: PROCEED (both R24 findings addressed -- JSON null guard tests added, age predicate negative control added). Netero: zero findings. 4-reviewer panel (Mafuuu, Bisky, Knuckle + Razor wildcard).
Zero findings across all reviewers. This PR is clean.
37 findings raised across 25 rounds. 33 resolved with code changes. 4 accepted as permanent (P4 debt or codebase-consistent patterns). Every finding in the inventory has a final disposition. No open items remain.
Razor verified all documented claims against the code: auth checks, doc comments, terminal status alignment, COALESCE preservation, SQL guards, CTE cascade, and JSON null guards. All pass.
"The code's claims match its reality across all four review modes." -- Razor
🤖 This review was automatically generated with Coder Agents.
|
@codex review |
Change-Id: I33bd31fa22dbf66c955f64741b70a17f95e1a22b Signed-off-by: Thomas Kosiewski <tk@coder.com>
|
Round 26. Churn Guard: PROCEED (0 open findings from R25). Netero: zero findings. 4-reviewer panel (Mafuuu, Bisky, Knuckle + Chopper wildcard). Severity tally: 1 P1, 0 P2. The rebase onto #23827 (organization scoping, migration 000467) added The tests pass against the in-memory store (which doesn't enforce FKs), hiding the fact that the SQL-level invariants these tests were designed to prove (COALESCE preservation, CTE cascade finalization, UNION-based deletion, cross-chat INSERT...SELECT guards, NULL three-valued logic, age predicate) are all unverified against real Postgres. The fix is mechanical: capture the org return value and pass Mafuuu and Chopper found no issues in the debug logging code itself -- it's unchanged since R25.
|

Summary
Add database persistence for chat debug logging: migration for
chat_debug_runsandchat_debug_stepstables, SQLC queries,dbauthzwrappers, db-to-SDK converters, site and user config toggles, deployment flag, and SDK types for the chat debug API.This is PR 1/9 in the chat debug logging stack.
Changes
000466: createschat_debug_runsandchat_debug_stepstables with JSONB payload columns, composite foreign keys, and indexes for stale-row finalization and message-based cleanup.coderd/database/queries/chatdebug.sql): insert/update/list/delete operations for runs and steps, plus cleanup queries for edit truncation and archive paths.siteconfig.sql):GetChatDebugLoggingEnabled/UpsertChatDebugLoggingEnabledfor the runtime admin setting that allows users to opt into debug logging.users.sql):GetUserChatDebugLoggingEnabled/UpsertUserChatDebugLoggingEnabledfor per-user opt-in.--chat-debug-logging-enabled/CODER_CHAT_DEBUG_LOGGING_ENABLEDto force debug logging on globally, bypassing runtime toggles.dbauthzwrappers: authorization checks ensuring users can only access debug data for chats they own.db2sdkconverters + tests:ChatDebugRun,ChatDebugRunSummary,ChatDebugStepconversion helpers with JSONB deserialization and nil-coalescing.codersdk/chats.go):ChatDebugRun,ChatDebugRunSummary,ChatDebugStep, status/kind/operation enums, and admin/user settings DTOs.Stack overview