Skip to content

clickhouse new syncs and verify-data#1304

Open
BilalG1 wants to merge 32 commits intodevfrom
clickhouse-sync-verify-data-integrity
Open

clickhouse new syncs and verify-data#1304
BilalG1 wants to merge 32 commits intodevfrom
clickhouse-sync-verify-data-integrity

Conversation

@BilalG1
Copy link
Copy Markdown
Collaborator

@BilalG1 BilalG1 commented Apr 1, 2026

Summary by CodeRabbit

  • New Features

    • Broader external DB sync: teams, members, permissions, invitations, email outbox, session replays, refresh tokens, and connected accounts now synchronize and track changes.
    • Sequencing support and automatic change-flagging added to many records for reliable incremental sync.
  • Improvements

    • New indexes and concurrent index creation for faster sync queries.
    • More robust sync pipelines, verification tooling, telemetry, dashboard sequencer stats, and expanded end-to-end sync tests.

BilalG1 added 21 commits March 16, 2026 10:53
…invites' into clickhouse-sync-email-outbox-table

# Conflicts:
#	apps/backend/scripts/clickhouse-migrations.ts
#	apps/backend/src/app/api/latest/internal/external-db-sync/sequencer/route.ts
#	apps/backend/src/lib/external-db-sync.ts
#	apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-basics.test.ts
#	apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-utils.ts
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
stack-auth-hosted-components Ready Ready Preview, Comment Apr 7, 2026 6:41pm
stack-backend Ready Ready Preview, Comment Apr 7, 2026 6:41pm
stack-dashboard Ready Ready Preview, Comment Apr 7, 2026 6:41pm
stack-demo Ready Ready Preview, Comment Apr 7, 2026 6:41pm
stack-docs Ready Ready Preview, Comment Apr 7, 2026 6:41pm

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

Adds sequencing columns and sync plumbing for many entities, extends ClickHouse mappings/migrations, integrates external-db-sync into numerous write/delete flows, expands the sequencer/backfill and status endpoints, adds ClickHouse verification tooling, and provides broad E2E tests and utilities for the new sync targets.

Changes

Cohort / File(s) Summary
Database Schema & Migrations
\.gitignore, apps/backend/prisma/migrations/..., apps/backend/prisma/schema.prisma
Added sequenceId (nullable BigInt) and shouldUpdateSequenceId (Boolean default true) fields to multiple models; added concurrent indexes and ignored a generated TS file.
External DB Sync Core
apps/backend/src/lib/external-db-sync.ts
Expanded ExternalDbSyncTarget union; added helpers withExternalDbSyncUpdate, markProjectUserForExternalDbSync, many bulk deletion recorders, enhanced recordExternalDbSyncDeletion, and introduced CLICKHOUSE_COLUMN_NORMALIZERS and related normalization/refactors.
ClickHouse Migrations & Mappings
apps/backend/scripts/clickhouse-migrations.ts, packages/stack-shared/src/config/db-sync-mappings.ts
Refactored migration pipeline to staged concurrent execution; added many new ClickHouse/Postgres mappings and DDL for new synced tables; added table-agnostic policies and grants.
Sequencer & Status
apps/backend/src/app/api/latest/internal/external-db-sync/sequencer/route.ts, .../status/route.ts
Backfill/enqueue and status endpoints extended to handle the new tables, added cascade marking (Team → VerificationCode), telemetry and expanded stats reporting.
API Handlers — Write/Deletion Integrations
apps/backend/src/app/api/latest/**/*.tsx, apps/backend/src/app/api/latest/**/route.tsx
Wrapped many create/update payloads with withExternalDbSyncUpdate(...); added pre-delete recording calls (recordExternalDbSyncDeletion and specialized recorders) across auth/session, teams, memberships, users, permissions, OAuth, emails, session replays, and verification code flows.
Email Pipeline
apps/backend/src/lib/email-queue-step.tsx, apps/backend/src/app/api/latest/emails/outbox/crud.tsx
EmailOutbox parsing updated to include sequenceId/shouldUpdateSequenceId; all EmailOutbox mutations now set shouldUpdateSequenceId = TRUE.
ClickHouse Verification & Integrity Tools
apps/backend/scripts/verify-data-integrity/*.ts
Added verifyClickhouseSync() with row-by-row normalized comparisons and CLI integration (--skip-clickhouse flag, env detection).
Cron / Runtime Changes
apps/backend/scripts/run-cron-jobs.ts
Changed cron loop sleep from randomized up-to-120s to fixed 1s interval.
E2E Tests & Utilities
apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-*.ts, apps/e2e/tests/backend/endpoints/api/v1/analytics-query.test.ts
Added extensive CRUD and sync E2E tests for new sync targets, new generalized waitForExternalDbRow helper and many waitForSynced* utilities; updated analytics grants expectations.
Frontend Dashboard
apps/dashboard/.../external-db-sync/page-client.tsx
Expanded ExternalDbSyncStatus type and UI to render sequencer stats for the expanded set of tables.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API
    participant Prisma
    participant Sequencer
    participant ExternalSync
    participant ClickHouse

    Client->>API: Create/Update entity (e.g., Team)
    API->>Prisma: upsert data wrapped with withExternalDbSyncUpdate()
    Prisma-->>API: row persisted with shouldUpdateSequenceId = true

    Note over Sequencer,Prisma: Backfill / Poll
    Sequencer->>Prisma: SELECT ... FOR UPDATE SKIP LOCKED (shouldUpdateSequenceId = true)
    Prisma-->>Sequencer: rows
    Sequencer->>Prisma: UPDATE sequenceId = nextval(...), shouldUpdateSequenceId = false
    Sequencer-->>Sequencer: enqueue tenant sync

    Sequencer->>ExternalSync: push mapping upserts/deletes
    ExternalSync->>ClickHouse: upsert into analytics tables
    ClickHouse-->>ExternalSync: ack
Loading
sequenceDiagram
    participant Client
    participant API
    participant Prisma
    participant DeletedRow
    participant Sequencer
    participant ExternalSync

    Client->>API: Delete entity (e.g., Team)
    API->>Prisma: recordExternalDbSyncTeamPermissionDeletionsForTeam()
    Prisma->>DeletedRow: INSERT deleted permission rows
    API->>Prisma: recordExternalDbSyncTeamMemberDeletionsForTeam()
    Prisma->>DeletedRow: INSERT deleted member rows
    API->>Prisma: recordExternalDbSyncDeletion() for Team
    Prisma->>DeletedRow: INSERT team deletion record
    API->>Prisma: DELETE Team
    Prisma-->>API: deletion committed

    Sequencer->>Prisma: process DeletedRow entries, enqueue tenants
    Sequencer->>ExternalSync: apply deletes to external DB (Postgres/ClickHouse)
    ExternalSync-->>Sequencer: ack
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • N2D4
  • nams1570

Poem

🐰 Hops and sequences, a tidy parade,

Rows marked and numbered, no sync left unswayed.
Teams, tokens, invites — all learned the new tune,
From Postgres to ClickHouse they sparkle by noon.
I nibble a carrot and watch the logs bloom.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is essentially empty except for the CONTRIBUTING.md reminder, lacking any explanation of the changes, motivation, testing, or impact analysis expected for a pull request of this scope and complexity. Add a comprehensive description explaining the new ClickHouse sync mappings being added, the verify-data-integrity functionality, database schema changes, and how the changes were tested.
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'clickhouse new syncs and verify-data' clearly describes the main changes: adding new ClickHouse sync mappings and data verification functionality, which aligns with the extensive additions to db-sync-mappings, verify-data-integrity scripts, and ClickHouse migration infrastructure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch clickhouse-sync-verify-data-integrity

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 1, 2026

Greptile Summary

This PR adds a ClickHouse data-integrity verifier script that fetches rows from both Postgres and ClickHouse for every sync mapping and compares them row-by-row, and also fixes two sync-tracking gaps — refresh token updates in tokens.tsx were not wrapped with withExternalDbSyncUpdate, and cascade-deleted refresh tokens during user deletion were not going through explicit Prisma queries that the sync infrastructure relies on.

Key changes:

  • clickhouse-sync-verifier.ts (new): Paginates Postgres rows via the same fetch queries used by the sync engine, fetches the corresponding ClickHouse rows from the default.* views, sorts both sides, normalises type differences (booleans → 0/1, dates → epoch ms, JSON → strings), and asserts exact row-count and field-value parity.
  • index.ts: Integrates the verifier into the existing integrity-check loop, guarded by STACK_CLICKHOUSE_URL availability and a --skip-clickhouse CLI flag.
  • tokens.tsx: Adds withExternalDbSyncUpdate wrapper to the projectUserRefreshToken.update call so lastActiveAt changes are tracked for sync.
  • crud.tsx: Adds an explicit deleteMany for a user's refresh tokens inside the deletion transaction (after recordExternalDbSyncRefreshTokenDeletionsForUser) so the cascade from projectUser.delete can no longer bypass sync tracking.
  • external-db-sync.ts: Exports CLICKHOUSE_COLUMN_NORMALIZERS for reuse by the verifier.

Confidence Score: 4/5

Safe to merge with minor follow-ups; production fixes (tokens.tsx, crud.tsx) are correct and the verifier script logic is sound.

All remaining findings are P2. The most actionable one is the implicit coupling between BATCH_LIMIT and the SQL LIMIT: if the SQL limit in db-sync-mappings.ts is ever reduced below 1000, the verifier will silently stop paginating early and miss discrepancies. The JSON string-equality comparison could also produce false-positive failures if key ordering diverges between the sync writer and the verifier reader. Neither is a current production bug, but both weaken the verifier's reliability as a safety net.

apps/backend/scripts/verify-data-integrity/clickhouse-sync-verifier.ts — BATCH_LIMIT coupling and JSON comparison robustness.

Important Files Changed

Filename Overview
apps/backend/scripts/verify-data-integrity/clickhouse-sync-verifier.ts New verification script that fetches Postgres and ClickHouse rows for every mapping and compares them row-by-row; contains a fragile BATCH_LIMIT constant that must match the SQL LIMIT, and uses plain Record for SORT_KEYS where Map is preferred per project rules.
apps/backend/scripts/verify-data-integrity/index.ts Wires in ClickHouse sync verification with --skip-clickhouse flag and STACK_CLICKHOUSE_URL guard; changes look correct and consistent with the rest of the script.
apps/backend/src/app/api/latest/users/crud.tsx Adds explicit deleteMany for refresh tokens before user deletion so the cascade does not bypass sync tracking; correctly ordered after recordExternalDbSyncRefreshTokenDeletionsForUser within the same transaction.
apps/backend/src/lib/external-db-sync.ts Exports CLICKHOUSE_COLUMN_NORMALIZERS so the verifier script can reuse it; minimal, non-breaking change.
apps/backend/src/lib/tokens.tsx Wraps the refresh-token lastActiveAt update with withExternalDbSyncUpdate so those field changes are tracked for ClickHouse sync; correct and minimal fix.
.gitignore Adds implementation.generated.ts to .gitignore; straightforward housekeeping.

Sequence Diagram

sequenceDiagram
    participant Script as verify-data-integrity/index.ts
    participant Verifier as clickhouse-sync-verifier.ts
    participant PG as Postgres (via Prisma)
    participant CH as ClickHouse (default.* views)

    Script->>Verifier: verifyClickhouseSync({ tenancy, projectId, branchId, recurse })
    loop for each mapping in DEFAULT_DB_SYNC_MAPPINGS
        Verifier->>Verifier: skip if no clickhouse fetchQuery
        loop paginate (cursor = lastSequenceId, BATCH_LIMIT=1000)
            Verifier->>PG: $queryRawUnsafe(fetchQuery, tenancyId, lastSequenceId)
            PG-->>Verifier: batch of rows (LIMIT 1000)
            Verifier->>Verifier: filter sync_is_deleted=true rows
            Verifier->>Verifier: strip sync_* columns
            Verifier->>Verifier: advance lastSequenceId to max in batch
        end
        Verifier->>CH: SELECT * FROM default.{table} WHERE project_id AND branch_id
        CH-->>Verifier: all non-deleted rows (FINAL applied in view)
        Verifier->>Verifier: assert pgRows.length == chRows.length
        Verifier->>Verifier: sort both by [project_id, branch_id, ...sortKeys]
        loop for each row pair
            Verifier->>Verifier: normalizeRow (booleans, dates, JSON, bigints)
            Verifier->>Verifier: deepEqual → throw StackAssertionError on mismatch
        end
    end
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: apps/backend/scripts/verify-data-integrity/clickhouse-sync-verifier.ts
Line: 165-200

Comment:
**BATCH_LIMIT must stay in sync with SQL LIMIT**

`BATCH_LIMIT = 1000` is used to detect the last page of pagination via `if (batch.length < BATCH_LIMIT) break`. The SQL queries in `db-sync-mappings.ts` all use `LIMIT 1000`, so this works today. However, if the SQL LIMIT is ever reduced below `BATCH_LIMIT` (e.g., to 500), a full batch of 500 rows would satisfy `500 < 1000 → true` and stop pagination prematurely, silently skipping remaining rows. This would cause the verifier to miss discrepancies — the very thing it's supposed to catch.

Consider adding a comment to document this tight coupling, or better yet, export the limit constant from `db-sync-mappings.ts` and import it here so both sides stay in sync automatically:

```ts
// IMPORTANT: BATCH_LIMIT must equal the LIMIT clause in internalDbFetchQueries.clickhouse (currently 1000 in db-sync-mappings.ts).
// If the SQL LIMIT is reduced below this value, pagination will terminate early and silently miss rows.
const BATCH_LIMIT = 1000;
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: apps/backend/scripts/verify-data-integrity/clickhouse-sync-verifier.ts
Line: 12-25

Comment:
**Use Map instead of plain object for dynamic key access**

`SORT_KEYS` is accessed with dynamic keys (`mappingName` from `Object.entries(DEFAULT_DB_SYNC_MAPPINGS)`). Per the project's custom rule, plain objects with dynamic string keys should be replaced with `Map<string, string[]>` to avoid prototype pollution (e.g., if a future mapping name were `__proto__` or `constructor`).

```ts
const SORT_KEYS = new Map<string, string[]>([
  ["users", ["id"]],
  ["contact_channels", ["id"]],
  ["teams", ["id"]],
  ["team_member_profiles", ["team_id", "user_id"]],
  ["team_permissions", ["team_id", "user_id", "permission_id"]],
  ["team_invitations", ["id"]],
  ["email_outboxes", ["id"]],
  ["session_replays", ["id"]],
  ["project_permissions", ["user_id", "permission_id"]],
  ["notification_preferences", ["id"]],
  ["refresh_tokens", ["id"]],
  ["connected_accounts", ["id"]],
]);
```

Then access as `SORT_KEYS.get(mappingName)` instead of `SORT_KEYS[mappingName]`.

**Rule Used:** Use Map<A, B> instead of plain objects when using ... ([source](https://app.greptile.com/review/custom-context?memory=cd0e08f7-0df2-43c8-8c71-97091bba4120))

**Learnt From**
[stack-auth/stack-auth#769](https://github.com/stack-auth/stack-auth/pull/769)
[stack-auth/stack-auth#835](https://github.com/stack-auth/stack-auth/pull/835)
[stack-auth/stack-auth#839](https://github.com/stack-auth/stack-auth/pull/839)
[stack-auth/stack-auth#472](https://github.com/stack-auth/stack-auth/pull/472)
[stack-auth/stack-auth#720](https://github.com/stack-auth/stack-auth/pull/720)
[stack-auth/stack-auth#700](https://github.com/stack-auth/stack-auth/pull/700)
[stack-auth/stack-auth#813](https://github.com/stack-auth/stack-auth/pull/813)

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: apps/backend/scripts/verify-data-integrity/clickhouse-sync-verifier.ts
Line: 63-77

Comment:
**JSON comparison after normalization uses string equality**

After `normalizePostgresValue` and `normalizeClickhouseValue` are applied, `json` columns become raw strings (Postgres: `JSON.stringify(parsedJsonb)`; ClickHouse: the stored string, returned as-is by `JSONEachRow`). These are then compared in `deepEqual` via `===` (string equality).

Postgres's `jsonb` type stores keys in alphabetical order, so `JSON.stringify` of a Postgres-returned object consistently produces alphabetically sorted keys. As long as the sync code also writes ClickHouse by `JSON.stringify`-ing the same Postgres-sourced object, the strings will match. However, if any code path writes to ClickHouse with a different key ordering or whitespace (e.g., extra spaces, non-alphabetical keys), this will produce false-positive failures.

To be safe, consider parsing both JSON strings and performing a deep structural comparison rather than relying on string equality:

```ts
if (columnType === "json") {
  try {
    return JSON.stringify(JSON.parse(value as string));
  } catch {
    return value;
  }
}
```

Apply this on the ClickHouse side to re-normalize the stored string before comparison.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "clickhouse verify data integrity" | Re-trigger Greptile

@BilalG1 BilalG1 requested a review from N2D4 April 3, 2026 22:39
@BilalG1 BilalG1 assigned N2D4 and unassigned BilalG1 Apr 3, 2026
@github-actions github-actions bot assigned BilalG1 and unassigned N2D4 Apr 7, 2026
Copy link
Copy Markdown
Contributor

@N2D4 N2D4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the future we could also have a data integrity check that compares REST API output to clickhouse data, which would ensure that the sync SQL query itself is correct, but that's for later (it will require us to think a decent amount about what values it should actually have)

@BilalG1 BilalG1 changed the base branch from clickhouse-sync-improve-test-speed to dev April 7, 2026 17:18
@BilalG1 BilalG1 changed the title clickhouse verify data integrity clickhouse new syncs and verify-data Apr 7, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
apps/backend/src/app/api/latest/auth/sessions/current/route.tsx (1)

34-48: ⚠️ Potential issue | 🟠 Major

Potential race condition: deletion recorded before actual delete succeeds.

The recordExternalDbSyncDeletion is called before deleteMany. If deleteMany fails (e.g., due to constraint violation or transient error), a deletion record will exist for a row that wasn't actually deleted, causing sync inconsistency.

Consider wrapping both operations in a transaction, or recording the deletion only after confirming the delete succeeded.

🔧 Proposed fix using transaction
     try {
       const prisma = await getPrismaClientForTenancy(tenancy);
 
-      await recordExternalDbSyncDeletion(globalPrismaClient, {
-        tableName: "ProjectUserRefreshToken",
-        tenancyId: tenancy.id,
-        refreshTokenId,
-      });
-
       const result = await globalPrismaClient.projectUserRefreshToken.deleteMany({
         where: {
           tenancyId: tenancy.id,
           id: refreshTokenId,
         },
       });
       // If no records were deleted, throw the same error as before
       if (result.count === 0) {
         throw new KnownErrors.RefreshTokenNotFoundOrExpired();
       }
+
+      await recordExternalDbSyncDeletion(globalPrismaClient, {
+        tableName: "ProjectUserRefreshToken",
+        tenancyId: tenancy.id,
+        refreshTokenId,
+      });
     } catch (e) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/src/app/api/latest/auth/sessions/current/route.tsx` around lines
34 - 48, The current flow calls recordExternalDbSyncDeletion before the actual
delete (globalPrismaClient.projectUserRefreshToken.deleteMany), risking a stale
deletion record if the delete fails; change the logic so the delete is performed
first and only on success record the external deletion, or wrap both actions in
a single transaction to ensure atomicity — locate the code around
getPrismaClientForTenancy, recordExternalDbSyncDeletion, and
globalPrismaClient.projectUserRefreshToken.deleteMany and either (A) perform
deleteMany, check result.count>0, then call recordExternalDbSyncDeletion, or (B)
execute both recordExternalDbSyncDeletion and deleteMany inside a transactional
block (using globalPrismaClient.$transaction or equivalent) so both succeed or
both roll back.
apps/backend/src/app/api/latest/auth/sessions/crud.tsx (1)

75-86: ⚠️ Potential issue | 🟡 Minor

Same ordering concern: deletion recorded before actual delete.

Similar to sessions/current/route.tsx, the deletion is recorded before the actual deleteMany. While the prior existence check (lines 59-68) reduces the risk, a concurrent deletion between the check and delete could still cause sync inconsistency.

For consistency with the other file's fix, consider recording the deletion after confirming success.

🔧 Proposed fix
-    await recordExternalDbSyncDeletion(globalPrismaClient, {
-      tableName: "ProjectUserRefreshToken",
-      tenancyId: auth.tenancy.id,
-      refreshTokenId: params.id,
-    });
-
     await globalPrismaClient.projectUserRefreshToken.deleteMany({
       where: {
         tenancyId: auth.tenancy.id,
         id: params.id,
       },
     });
+
+    await recordExternalDbSyncDeletion(globalPrismaClient, {
+      tableName: "ProjectUserRefreshToken",
+      tenancyId: auth.tenancy.id,
+      refreshTokenId: params.id,
+    });
   },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/src/app/api/latest/auth/sessions/crud.tsx` around lines 75 - 86,
The deletion is recorded via recordExternalDbSyncDeletion before the actual
delete, risking inconsistent syncs on concurrent operations; change the order so
you first perform globalPrismaClient.projectUserRefreshToken.deleteMany (capture
its result, e.g., const result = await
globalPrismaClient.projectUserRefreshToken.deleteMany({...})) and only if
result.count > 0 call recordExternalDbSyncDeletion with the same payload
(tableName "ProjectUserRefreshToken", tenancyId auth.tenancy.id, refreshTokenId
params.id) so the external sync is recorded only when the DB deletion actually
succeeded.
apps/backend/src/oauth/model.tsx (1)

182-191: ⚠️ Potential issue | 🟡 Minor

Inconsistent withExternalDbSyncUpdate usage: create branch not wrapped.

The update payload is wrapped with withExternalDbSyncUpdate, but the create payload is not. The notification-preference/crud.tsx upsert wraps both branches with withExternalDbSyncUpdate, creating an inconsistency.

Clarify whether new refresh token records should also trigger external DB sync marking. If yes, wrap the create branch for consistency with other upsert patterns in the codebase.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/src/oauth/model.tsx` around lines 182 - 191, The upsert call
currently applies withExternalDbSyncUpdate to the update branch but not to the
create branch, causing inconsistency with other upserts (e.g.,
notification-preference/crud.tsx); to fix, wrap the create payload in
withExternalDbSyncUpdate as well so both branches mark external DB sync (i.e.,
change the create object passed to the upsert to be withExternalDbSyncUpdate({
refreshToken: token.refreshToken, tenancyId: tenancy.id, projectUserId: user.id
}) while keeping the update branch unchanged), locating this change around the
upsert call that references withExternalDbSyncUpdate in the same module.
apps/backend/src/lib/permissions.tsx (1)

433-439: ⚠️ Potential issue | 🟠 Major

Missing external DB sync deletion recording for team permissions.

When deleting a permission definition with team scope, deleteMany is called directly without recording deletions via recordExternalDbSyncDeletion. This is inconsistent with the project scope handling (lines 441-460) which properly records each deletion before calling deleteMany.

Team permission deletions won't be synced to external databases with the current implementation.

🐛 Proposed fix to add deletion recording for team scope
   // Remove all direct permissions for this permission ID
   if (options.scope === "team") {
+    const teamPermissions = await sourceOfTruthTx.teamMemberDirectPermission.findMany({
+      where: {
+        tenancyId: options.tenancy.id,
+        permissionId: options.permissionId,
+      },
+      select: { id: true },
+    });
+    for (const perm of teamPermissions) {
+      await recordExternalDbSyncDeletion(sourceOfTruthTx, {
+        tableName: "TeamMemberDirectPermission",
+        tenancyId: options.tenancy.id,
+        permissionDbId: perm.id,
+      });
+    }
     await sourceOfTruthTx.teamMemberDirectPermission.deleteMany({
       where: {
         tenancyId: options.tenancy.id,
         permissionId: options.permissionId,
       },
     });
   } else {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/src/lib/permissions.tsx` around lines 433 - 439, The team-scoped
branch currently calls sourceOfTruthTx.teamMemberDirectPermission.deleteMany
without recording deletions; mirror the project-scoped flow by first querying
the matching teamMemberDirectPermission rows (by tenancyId and permissionId) and
for each row call recordExternalDbSyncDeletion with the row id and appropriate
metadata, then perform the deleteMany; update the options.scope === "team" block
to use sourceOfTruthTx.teamMemberDirectPermission.findMany (or equivalent) to
collect ids, call recordExternalDbSyncDeletion for each, and only afterwards
call sourceOfTruthTx.teamMemberDirectPermission.deleteMany to remove them.
🧹 Nitpick comments (5)
apps/backend/scripts/run-cron-jobs.ts (1)

29-33: Consider adding backoff on error to avoid rapid retries during outages.

The fixed 1-second interval applies regardless of whether the previous call succeeded or failed. If an endpoint experiences persistent failures (e.g., database down, network issues), this will cause rapid-fire retries every second, potentially overwhelming the system and flooding logs.

Consider increasing the wait time after errors:

♻️ Proposed fix with error backoff
         const runResult = await Result.fromPromise(run(endpoint));
         if (runResult.status === "error") {
           captureError("run-cron-jobs", runResult.error);
+          await wait(10_000); // Wait longer on error to avoid rapid retries
+        } else {
+          await wait(1000);
         }
-        await wait(1000);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/scripts/run-cron-jobs.ts` around lines 29 - 33, The loop always
pauses a fixed 1s causing rapid retries on failures; modify the logic around
runResult (from Result.fromPromise(run(endpoint))) so that when runResult.status
=== "error" you apply a backoff before the next iteration (e.g., exponential or
capped linear backoff) and call captureError("run-cron-jobs", runResult.error)
as today, but increase wait() based on a retry counter or timestamp; reset the
backoff counter when run(endpoint) succeeds so successful runs continue at the
normal interval, and enforce a configurable max backoff to avoid unbounded
delays.
apps/backend/src/app/api/latest/auth/sessions/current/route.tsx (1)

35-36: Unused prisma variable.

The prisma client is obtained but never used. Either remove this line or use prisma instead of globalPrismaClient for the delete operation if tenancy-scoped client is intended.

♻️ Proposed fix
     try {
-      const prisma = await getPrismaClientForTenancy(tenancy);
-
       await recordExternalDbSyncDeletion(globalPrismaClient, {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/src/app/api/latest/auth/sessions/current/route.tsx` around lines
35 - 36, The prisma client returned by getPrismaClientForTenancy(tenancy) is
created but unused; either remove that call or switch the delete call to use the
tenancy-scoped client. Update the code so the session deletion uses prisma (the
variable returned from getPrismaClientForTenancy) instead of globalPrismaClient,
or remove the prisma assignment if tenancy-scoped behavior is not needed; ensure
references to globalPrismaClient are replaced with prisma in the current route
handler (where the delete operation occurs) if you choose the tenancy-scoped
fix.
apps/backend/src/app/api/latest/session-replays/batch/route.tsx (1)

202-205: Redundant shouldUpdateSequenceId update after chunk creation.

The prisma.sessionReplay.upsert at lines 114-130 already sets shouldUpdateSequenceId: true in both the create and update branches. Since chunk creation doesn't modify the SessionReplay row's synced fields, this additional update is unnecessary and adds an extra database round-trip on every non-deduped upload.

🔧 Suggested removal
-    await prisma.sessionReplay.update({
-      where: { tenancyId_id: { tenancyId, id: replayId } },
-      data: { shouldUpdateSequenceId: true },
-    });
-
     return {
       statusCode: 200,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/src/app/api/latest/session-replays/batch/route.tsx` around lines
202 - 205, The explicit prisma.sessionReplay.update call that sets
shouldUpdateSequenceId: true after chunk creation is redundant because the
earlier prisma.sessionReplay.upsert (in the same flow) already sets
shouldUpdateSequenceId: true in both create and update branches; remove the
extra prisma.sessionReplay.update that uses where: { tenancyId_id: { tenancyId,
id: replayId } } and data: { shouldUpdateSequenceId: true } to avoid the
unnecessary additional DB round-trip after chunk creation.
apps/backend/src/app/api/latest/internal/external-db-sync/sequencer/route.ts (1)

189-214: Minor: CTE selects only id but orders by tenancyId.

The rows_to_update CTE orders by tenancyId but only selects id. This works correctly for row locking, but including tenancyId in the SELECT would be more explicit about the ordering's purpose and match other queries in this file.

♻️ Suggested change for consistency
       WITH rows_to_update AS (
-        SELECT "id"
+        SELECT "id", "tenancyId"
         FROM "TeamMemberDirectPermission"
         WHERE "shouldUpdateSequenceId" = TRUE
         ORDER BY "tenancyId"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/src/app/api/latest/internal/external-db-sync/sequencer/route.ts`
around lines 189 - 214, The CTE rows_to_update currently SELECTs only "id" but
orders by "tenancyId"; update the rows_to_update CTE in the
globalPrismaClient.$queryRaw call so it also SELECTs "tenancyId" (e.g., SELECT
"id", "tenancyId") to make the ORDER BY explicit and consistent with other
queries; ensure the rest of the query (the updated_rows UPDATE and the final
SELECT DISTINCT "tenancyId") continues to work unchanged and that the returned
teamPermissionTenants variable and subsequent enqueueExternalDbSyncBatch call
still map t => t.tenancyId correctly.
apps/backend/src/app/api/latest/internal/external-db-sync/status/route.ts (1)

252-300: Consider documenting the mapping coverage.

buildMappingInternalStats currently only computes internal stats for users and email_outboxes mappings. The other sequenced tables (teams, team_members, etc.) have sequencer stats exposed but no corresponding mapping entries. If this is intentional (e.g., these tables are synced through different mechanisms or planned for future mapping expansion), a brief comment would help clarify.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/src/app/api/latest/internal/external-db-sync/status/route.ts`
around lines 252 - 300, In buildMappingInternalStats, add a brief explanatory
comment above the mappingInternalStats construction stating why only "users" and
"email_outboxes" are included (e.g., other sequenced tables like
teams/team_members are intentionally not mapped here because they are synced
differently or planned for future mapping), or alternatively add the missing
mapping entries if omission was accidental; reference the function
buildMappingInternalStats, the mappingInternalStats map, and
deletedRowsByTable/SequenceStats to locate where to place the comment or add
mappings so reviewers understand intended coverage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/backend/scripts/verify-data-integrity/clickhouse-sync-verifier.ts`:
- Around line 12-24: SORT_KEYS currently uses non-unique keys for some mappings
causing row misalignment; update the SORT_KEYS object so keys match the actual
unique columns returned by the sync queries: set "team_permissions" and
"project_permissions" to ["id"], set "notification_preferences" to a composite
unique key (e.g. ["user_id","notification_type"]) and set "connected_accounts"
to a composite unique key (e.g. ["provider","provider_account_id","user_id"]);
modify the SORT_KEYS constant accordingly so the verifier sorts and compares
rows using the true unique identifier columns for each mapping.

In `@apps/backend/src/lib/external-db-sync.ts`:
- Around line 1141-1147: The schema mapping for email_outboxes mistakenly
normalizes rendered_is_transactional instead of the ClickHouse-aliased
is_transactional; update the email_outboxes mapping in external-db-sync.ts to
use is_transactional (and keep the nullable_boolean normalizer) so the
nullable-boolean normalizer runs and Postgres true/false are compared correctly
against ClickHouse 1/0 (check references to rendered_is_transactional,
is_transactional, and the nullable_boolean normalizer to locate and update the
mapping).

In `@apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-utils.ts`:
- Around line 165-170: The catch block treating Postgres error '42P01'
(undefined table) as failure should instead treat it as success when the caller
expects the table to be absent; update the handler around client.query so that
when err.code === '42P01' you return true if shouldExist === false (and still
return/propagate false or rethrow for other cases), referencing the client.query
call and the shouldExist flag to locate and fix the logic.

In `@packages/stack-shared/src/config/db-sync-mappings.ts`:
- Around line 931-935: The deleted-rows SELECT is aliasing the permission column
as "id" which breaks row-shape consistency expected by pushRowsToExternalDb;
change the alias ("DeletedRow"."primaryKey"->>'permissionId')::text AS "id" to
AS "permission_id" (and update the other similar occurrences referenced around
lines 1614-1617) so the deleted row column names match the live branch (e.g.,
keep "team_id", "user_id", "permission_id", "created_at") and ensure any
consumers (pushRowsToExternalDb) receive uniform column names/order.
- Around line 347-360: The tombstone rows for deleted contact_channels/teams use
NULL::text for non-null ClickHouse String columns (see the SELECT emitting NULL
for "type" and "value" and similar NULL for "display_name"), causing inserts to
fail; update the mappings in db-sync-mappings.ts (the delete-row SELECT that
sets "sync_is_deleted" true and fields "type","value", and "display_name") to
emit a non-null tombstone string (e.g., empty string or a constant like
'DELETED') instead of NULL::text for those fields, and apply the same change to
the other delete-mapping block referenced (around the other occurrence at lines
543-555) so ClickHouse non-null String columns receive valid values.

---

Outside diff comments:
In `@apps/backend/src/app/api/latest/auth/sessions/crud.tsx`:
- Around line 75-86: The deletion is recorded via recordExternalDbSyncDeletion
before the actual delete, risking inconsistent syncs on concurrent operations;
change the order so you first perform
globalPrismaClient.projectUserRefreshToken.deleteMany (capture its result, e.g.,
const result = await
globalPrismaClient.projectUserRefreshToken.deleteMany({...})) and only if
result.count > 0 call recordExternalDbSyncDeletion with the same payload
(tableName "ProjectUserRefreshToken", tenancyId auth.tenancy.id, refreshTokenId
params.id) so the external sync is recorded only when the DB deletion actually
succeeded.

In `@apps/backend/src/app/api/latest/auth/sessions/current/route.tsx`:
- Around line 34-48: The current flow calls recordExternalDbSyncDeletion before
the actual delete (globalPrismaClient.projectUserRefreshToken.deleteMany),
risking a stale deletion record if the delete fails; change the logic so the
delete is performed first and only on success record the external deletion, or
wrap both actions in a single transaction to ensure atomicity — locate the code
around getPrismaClientForTenancy, recordExternalDbSyncDeletion, and
globalPrismaClient.projectUserRefreshToken.deleteMany and either (A) perform
deleteMany, check result.count>0, then call recordExternalDbSyncDeletion, or (B)
execute both recordExternalDbSyncDeletion and deleteMany inside a transactional
block (using globalPrismaClient.$transaction or equivalent) so both succeed or
both roll back.

In `@apps/backend/src/lib/permissions.tsx`:
- Around line 433-439: The team-scoped branch currently calls
sourceOfTruthTx.teamMemberDirectPermission.deleteMany without recording
deletions; mirror the project-scoped flow by first querying the matching
teamMemberDirectPermission rows (by tenancyId and permissionId) and for each row
call recordExternalDbSyncDeletion with the row id and appropriate metadata, then
perform the deleteMany; update the options.scope === "team" block to use
sourceOfTruthTx.teamMemberDirectPermission.findMany (or equivalent) to collect
ids, call recordExternalDbSyncDeletion for each, and only afterwards call
sourceOfTruthTx.teamMemberDirectPermission.deleteMany to remove them.

In `@apps/backend/src/oauth/model.tsx`:
- Around line 182-191: The upsert call currently applies
withExternalDbSyncUpdate to the update branch but not to the create branch,
causing inconsistency with other upserts (e.g.,
notification-preference/crud.tsx); to fix, wrap the create payload in
withExternalDbSyncUpdate as well so both branches mark external DB sync (i.e.,
change the create object passed to the upsert to be withExternalDbSyncUpdate({
refreshToken: token.refreshToken, tenancyId: tenancy.id, projectUserId: user.id
}) while keeping the update branch unchanged), locating this change around the
upsert call that references withExternalDbSyncUpdate in the same module.

---

Nitpick comments:
In `@apps/backend/scripts/run-cron-jobs.ts`:
- Around line 29-33: The loop always pauses a fixed 1s causing rapid retries on
failures; modify the logic around runResult (from
Result.fromPromise(run(endpoint))) so that when runResult.status === "error" you
apply a backoff before the next iteration (e.g., exponential or capped linear
backoff) and call captureError("run-cron-jobs", runResult.error) as today, but
increase wait() based on a retry counter or timestamp; reset the backoff counter
when run(endpoint) succeeds so successful runs continue at the normal interval,
and enforce a configurable max backoff to avoid unbounded delays.

In `@apps/backend/src/app/api/latest/auth/sessions/current/route.tsx`:
- Around line 35-36: The prisma client returned by
getPrismaClientForTenancy(tenancy) is created but unused; either remove that
call or switch the delete call to use the tenancy-scoped client. Update the code
so the session deletion uses prisma (the variable returned from
getPrismaClientForTenancy) instead of globalPrismaClient, or remove the prisma
assignment if tenancy-scoped behavior is not needed; ensure references to
globalPrismaClient are replaced with prisma in the current route handler (where
the delete operation occurs) if you choose the tenancy-scoped fix.

In
`@apps/backend/src/app/api/latest/internal/external-db-sync/sequencer/route.ts`:
- Around line 189-214: The CTE rows_to_update currently SELECTs only "id" but
orders by "tenancyId"; update the rows_to_update CTE in the
globalPrismaClient.$queryRaw call so it also SELECTs "tenancyId" (e.g., SELECT
"id", "tenancyId") to make the ORDER BY explicit and consistent with other
queries; ensure the rest of the query (the updated_rows UPDATE and the final
SELECT DISTINCT "tenancyId") continues to work unchanged and that the returned
teamPermissionTenants variable and subsequent enqueueExternalDbSyncBatch call
still map t => t.tenancyId correctly.

In `@apps/backend/src/app/api/latest/internal/external-db-sync/status/route.ts`:
- Around line 252-300: In buildMappingInternalStats, add a brief explanatory
comment above the mappingInternalStats construction stating why only "users" and
"email_outboxes" are included (e.g., other sequenced tables like
teams/team_members are intentionally not mapped here because they are synced
differently or planned for future mapping), or alternatively add the missing
mapping entries if omission was accidental; reference the function
buildMappingInternalStats, the mappingInternalStats map, and
deletedRowsByTable/SequenceStats to locate where to place the comment or add
mappings so reviewers understand intended coverage.

In `@apps/backend/src/app/api/latest/session-replays/batch/route.tsx`:
- Around line 202-205: The explicit prisma.sessionReplay.update call that sets
shouldUpdateSequenceId: true after chunk creation is redundant because the
earlier prisma.sessionReplay.upsert (in the same flow) already sets
shouldUpdateSequenceId: true in both create and update branches; remove the
extra prisma.sessionReplay.update that uses where: { tenancyId_id: { tenancyId,
id: replayId } } and data: { shouldUpdateSequenceId: true } to avoid the
unnecessary additional DB round-trip after chunk creation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8967ebb2-a916-4506-a2ac-58df8d42c52a

📥 Commits

Reviewing files that changed from the base of the PR and between e8428f5 and a54ca54.

📒 Files selected for processing (38)
  • .gitignore
  • apps/backend/prisma/migrations/20260316000000_add_team_team_member_sequence_columns/migration.sql
  • apps/backend/prisma/migrations/20260316000001_add_email_outbox_sequence_columns/migration.sql
  • apps/backend/prisma/migrations/20260316000002_add_session_replay_sequence_columns/migration.sql
  • apps/backend/prisma/migrations/20260317000000_add_team_permission_invitation_sequence_columns/migration.sql
  • apps/backend/prisma/migrations/20260317000001_add_project_permission_notification_preference_sequence_columns/migration.sql
  • apps/backend/prisma/migrations/20260318000000_add_sequence_id_to_refresh_tokens_and_oauth_accounts/migration.sql
  • apps/backend/prisma/migrations/20260318000001_add_sequence_indexes_concurrently/migration.sql
  • apps/backend/prisma/schema.prisma
  • apps/backend/scripts/clickhouse-migrations.ts
  • apps/backend/scripts/run-cron-jobs.ts
  • apps/backend/scripts/verify-data-integrity/clickhouse-sync-verifier.ts
  • apps/backend/scripts/verify-data-integrity/index.ts
  • apps/backend/src/app/api/latest/auth/password/update/route.tsx
  • apps/backend/src/app/api/latest/auth/sessions/crud.tsx
  • apps/backend/src/app/api/latest/auth/sessions/current/route.tsx
  • apps/backend/src/app/api/latest/emails/notification-preference/crud.tsx
  • apps/backend/src/app/api/latest/emails/outbox/crud.tsx
  • apps/backend/src/app/api/latest/emails/unsubscribe-link/route.tsx
  • apps/backend/src/app/api/latest/internal/external-db-sync/sequencer/route.ts
  • apps/backend/src/app/api/latest/internal/external-db-sync/status/route.ts
  • apps/backend/src/app/api/latest/oauth-providers/crud.tsx
  • apps/backend/src/app/api/latest/session-replays/batch/route.tsx
  • apps/backend/src/app/api/latest/team-member-profiles/crud.tsx
  • apps/backend/src/app/api/latest/team-memberships/crud.tsx
  • apps/backend/src/app/api/latest/teams/crud.tsx
  • apps/backend/src/app/api/latest/users/crud.tsx
  • apps/backend/src/lib/email-queue-step.tsx
  • apps/backend/src/lib/external-db-sync.ts
  • apps/backend/src/lib/permissions.tsx
  • apps/backend/src/lib/tokens.tsx
  • apps/backend/src/oauth/model.tsx
  • apps/backend/src/route-handlers/verification-code-handler.tsx
  • apps/dashboard/src/app/(main)/(protected)/projects/[projectId]/external-db-sync/page-client.tsx
  • apps/e2e/tests/backend/endpoints/api/v1/analytics-query.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-basics.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-utils.ts
  • packages/stack-shared/src/config/db-sync-mappings.ts

Comment on lines +12 to +24
const SORT_KEYS = {
users: ["id"],
contact_channels: ["id"],
teams: ["id"],
team_member_profiles: ["team_id", "user_id"],
team_permissions: ["team_id", "user_id", "permission_id"],
team_invitations: ["id"],
email_outboxes: ["id"],
project_permissions: ["user_id", "permission_id"],
notification_preferences: ["id"],
refresh_tokens: ["id"],
connected_accounts: ["id"],
} satisfies Record<keyof typeof DEFAULT_DB_SYNC_MAPPINGS, string[]>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix SORT_KEYS for mappings that don't sort by id/permission_id.

team_permissions/project_permissions rows returned by the sync query use id, while notification_preferences and connected_accounts do not have an id column at all. Right now those mappings collapse to non-unique sort keys, so the verifier can compare the wrong rows and report false mismatches.

🛠️ Suggested sort keys
 const SORT_KEYS = {
   users: ["id"],
   contact_channels: ["id"],
   teams: ["id"],
   team_member_profiles: ["team_id", "user_id"],
-  team_permissions: ["team_id", "user_id", "permission_id"],
+  team_permissions: ["team_id", "user_id", "id"],
   team_invitations: ["id"],
   email_outboxes: ["id"],
-  project_permissions: ["user_id", "permission_id"],
-  notification_preferences: ["id"],
+  project_permissions: ["user_id", "id"],
+  notification_preferences: ["user_id", "notification_category_id"],
   refresh_tokens: ["id"],
-  connected_accounts: ["id"],
+  connected_accounts: ["user_id", "provider", "provider_account_id"],
 } satisfies Record<keyof typeof DEFAULT_DB_SYNC_MAPPINGS, string[]>;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const SORT_KEYS = {
users: ["id"],
contact_channels: ["id"],
teams: ["id"],
team_member_profiles: ["team_id", "user_id"],
team_permissions: ["team_id", "user_id", "permission_id"],
team_invitations: ["id"],
email_outboxes: ["id"],
project_permissions: ["user_id", "permission_id"],
notification_preferences: ["id"],
refresh_tokens: ["id"],
connected_accounts: ["id"],
} satisfies Record<keyof typeof DEFAULT_DB_SYNC_MAPPINGS, string[]>;
const SORT_KEYS = {
users: ["id"],
contact_channels: ["id"],
teams: ["id"],
team_member_profiles: ["team_id", "user_id"],
team_permissions: ["team_id", "user_id", "id"],
team_invitations: ["id"],
email_outboxes: ["id"],
project_permissions: ["user_id", "id"],
notification_preferences: ["user_id", "notification_category_id"],
refresh_tokens: ["id"],
connected_accounts: ["user_id", "provider", "provider_account_id"],
} satisfies Record<keyof typeof DEFAULT_DB_SYNC_MAPPINGS, string[]>;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/scripts/verify-data-integrity/clickhouse-sync-verifier.ts`
around lines 12 - 24, SORT_KEYS currently uses non-unique keys for some mappings
causing row misalignment; update the SORT_KEYS object so keys match the actual
unique columns returned by the sync queries: set "team_permissions" and
"project_permissions" to ["id"], set "notification_preferences" to a composite
unique key (e.g. ["user_id","notification_type"]) and set "connected_accounts"
to a composite unique key (e.g. ["provider","provider_account_id","user_id"]);
modify the SORT_KEYS constant accordingly so the verifier sorts and compares
rows using the true unique identifier columns for each mapping.

Comment on lines 165 to 170
try {
res = await client.query(`SELECT * FROM "users" WHERE "primary_email" = $1`, [email]);
res = await client.query(query, params);
} catch (err: any) {
if (err && err.code === '42P01') {
return false;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Treat 42P01 as success when waiting for deletions.

If the table was never created, shouldExist: false is already satisfied. Returning false here makes deletion waits time out in that case.

🛠️ Minimal fix
-        if (err && err.code === '42P01') {
-          return false;
+        if (err && err.code === '42P01') {
+          return !opts.shouldExist;
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-utils.ts` around
lines 165 - 170, The catch block treating Postgres error '42P01' (undefined
table) as failure should instead treat it as success when the caller expects the
table to be absent; update the handler around client.query so that when err.code
=== '42P01' you return true if shouldExist === false (and still return/propagate
false or rethrow for other cases), referencing the client.query call and the
shouldExist flag to locate and fix the logic.

Comment on lines +347 to +360
SELECT
"Tenancy"."projectId" AS "project_id",
"Tenancy"."branchId" AS "branch_id",
("DeletedRow"."primaryKey"->>'id')::uuid AS "id",
("DeletedRow"."primaryKey"->>'projectUserId')::uuid AS "user_id",
NULL::text AS "type",
NULL::text AS "value",
false AS "is_primary",
false AS "is_verified",
false AS "used_for_auth",
"DeletedRow"."deletedAt"::timestamp without time zone AS "created_at",
"DeletedRow"."sequenceId" AS "sync_sequence_id",
"DeletedRow"."tenancyId" AS "tenancyId",
true AS "sync_is_deleted"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Use non-null tombstone values for non-null ClickHouse columns.

contact_channels deletes emit NULL for type/value, and teams deletes emit NULL for display_name, but the raw ClickHouse tables define those as non-null String columns. The first delete batch for either mapping will fail to insert the tombstone instead of marking the row deleted.

🛠️ Minimal fix
-            NULL::text AS "type",
-            NULL::text AS "value",
+            ''::text AS "type",
+            ''::text AS "value",
...
-            NULL::text AS "display_name",
+            ''::text AS "display_name",

Also applies to: 543-555

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stack-shared/src/config/db-sync-mappings.ts` around lines 347 - 360,
The tombstone rows for deleted contact_channels/teams use NULL::text for
non-null ClickHouse String columns (see the SELECT emitting NULL for "type" and
"value" and similar NULL for "display_name"), causing inserts to fail; update
the mappings in db-sync-mappings.ts (the delete-row SELECT that sets
"sync_is_deleted" true and fields "type","value", and "display_name") to emit a
non-null tombstone string (e.g., empty string or a constant like 'DELETED')
instead of NULL::text for those fields, and apply the same change to the other
delete-mapping block referenced (around the other occurrence at lines 543-555)
so ClickHouse non-null String columns receive valid values.

Comment on lines +931 to +935
SELECT
("DeletedRow"."primaryKey"->>'teamId')::uuid AS "team_id",
("DeletedRow"."primaryKey"->>'projectUserId')::uuid AS "user_id",
"DeletedRow"."primaryKey"->>'permissionId' AS "id",
"DeletedRow"."deletedAt"::timestamp without time zone AS "created_at",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Keep permission_id named consistently in deleted rows.

The live branches expose this column as permission_id, but the deleted branches rename it to id. pushRowsToExternalDb requires every row in a batch to have the same column names/order, so any mixed live+deleted batch here will trip the row-shape assertion and stop syncing.

🛠️ Minimal fix
-          "DeletedRow"."primaryKey"->>'permissionId' AS "id",
+          "DeletedRow"."primaryKey"->>'permissionId' AS "permission_id",
...
-          "DeletedRow"."primaryKey"->>'permissionId' AS "id",
+          "DeletedRow"."primaryKey"->>'permissionId' AS "permission_id",

Also applies to: 1614-1617

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stack-shared/src/config/db-sync-mappings.ts` around lines 931 - 935,
The deleted-rows SELECT is aliasing the permission column as "id" which breaks
row-shape consistency expected by pushRowsToExternalDb; change the alias
("DeletedRow"."primaryKey"->>'permissionId')::text AS "id" to AS "permission_id"
(and update the other similar occurrences referenced around lines 1614-1617) so
the deleted row column names match the live branch (e.g., keep "team_id",
"user_id", "permission_id", "created_at") and ensure any consumers
(pushRowsToExternalDb) receive uniform column names/order.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
apps/backend/src/lib/external-db-sync.ts (2)

1187-1189: Assert the derived target table instead of using !.

The invariant here is "parsed table name is a non-empty string", not just "not undefined". Reusing assertNonEmptyString makes that assumption explicit and keeps this path aligned with the rest of the file's validation style.

🛠️ Minimal fix
-  const targetTable = tableName.includes('.') ? tableName.split('.').pop()! : tableName;
+  const targetTable = tableName.includes(".") ? tableName.split(".").pop() : tableName;
+  assertNonEmptyString(targetTable, `ClickHouse target table derived from ${JSON.stringify(tableName)}`);
   const normalizers = CLICKHOUSE_COLUMN_NORMALIZERS[targetTable] ?? {};

As per coding guidelines "Validate all assumptions through type system (preferred), assertions, or tests; ideally two out of three".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/src/lib/external-db-sync.ts` around lines 1187 - 1189, The code
derives targetTable with tableName.split('.').pop()! which uses a non-null
assertion; replace that with an explicit assertion by calling
assertNonEmptyString on the derived value to ensure the invariant "parsed table
name is a non-empty string"; locate the snippet that creates targetTable and
change it to compute the value then call assertNonEmptyString(targetTable,
'targetTable') (or equivalent) before using it with
CLICKHOUSE_COLUMN_NORMALIZERS to match the file's validation style.

1107-1167: Make the normalizer table keyed by the real target-table union.

The write path looks up CLICKHOUSE_COLUMN_NORMALIZERS with mapping.targetTable, while the verifier currently looks it up with the mapping key. Keeping this as Record<string, ...> hides that coupling and lets misspelled table names silently opt out of normalization. Typing it as Partial<Record<MappingTargetTable, ...>> would force both call sites onto the same key shape, including apps/backend/scripts/verify-data-integrity/clickhouse-sync-verifier.ts:160.

As per coding guidelines "Validate all assumptions through type system (preferred), assertions, or tests; ideally two out of three".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/src/lib/external-db-sync.ts` around lines 1107 - 1167, The
CLICKHOUSE_COLUMN_NORMALIZERS map is typed too loosely as Record<string,...>, so
misspelled or mismatched mapping keys can bypass normalization; change its type
to Partial<Record<MappingTargetTable,
Record<string,'json'|'boolean'|'nullable_boolean'|'bigint'>>> (use the
MappingTargetTable union) and update any callers to use the canonical
mapping.targetTable key when looking up normalizers (not the mapping object key)
so both the write path and the verifier use the same key shape; run the
TypeScript build to fix any call-site errors and adjust code to satisfy the
stricter type (e.g., change verifier lookup to
CLICKHOUSE_COLUMN_NORMALIZERS[mapping.targetTable]).
apps/backend/scripts/verify-data-integrity/clickhouse-sync-verifier.ts (1)

152-153: Missing fetch queries shouldn't silently skip a mapping.

A missing internalDbFetchQueries.clickhouse currently turns into a quiet return, which makes the verifier look green even though that mapping was never checked.

🛠️ Fail loudly instead
       const fetchQuery = mapping.internalDbFetchQueries.clickhouse;
-      if (!fetchQuery) return;
+      if (fetchQuery == null) {
+        throw new StackAssertionError(`Missing ClickHouse fetch query for mapping ${mappingName}.`);
+      }

As per coding guidelines "Fail early, fail loud; fail fast with an error instead of silently continuing" and "Prefer explicit null/undefinedness checks over boolean checks (e.g., foo == null instead of !foo)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/scripts/verify-data-integrity/clickhouse-sync-verifier.ts`
around lines 152 - 153, The code silently skips mappings when
mapping.internalDbFetchQueries.clickhouse is missing; change the check to an
explicit null/undefined check (use fetchQuery == null) and throw a descriptive
Error instead of returning so the verifier fails loudly (include mapping
identifier in the message). Locate the fetchQuery assignment and the conditional
around mapping.internalDbFetchQueries.clickhouse, replace the early return with
throwing an Error that names the mapping (e.g., mapping.name or mapping.id) and
mentions the missing internalDbFetchQueries.clickhouse entry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/backend/scripts/verify-data-integrity/clickhouse-sync-verifier.ts`:
- Around line 40-42: The code currently stringifies JSON columns in the
columnType === "json" branch which prevents stripNullsAndEmpties() and
deepEqual() from normalizing structural differences; instead, return a
structured JS value so the scrub can operate: if the incoming value is a string
try JSON.parse and return the resulting object, if it's already an object return
it unchanged, and only fallback to stringifying on parse error; ensure the same
logic is applied wherever columnType === "json" is handled (the other occurrence
noted) so stripNullsAndEmpties() and deepEqual() run on actual objects rather
than flattened strings.
- Around line 28-35: The compareRows function currently collapses missing sort
keys to "" which masks schema drift; instead, in compareRows(validate that for
each key in sortKeys both a and b actually have the key (use
Object.prototype.hasOwnProperty.call or key in ...) and if a key is missing on
either row throw a descriptive Error naming the missing key and which row (a or
b) so the verifier fails fast; after validating presence, continue with the
string comparison logic (convert values to strings and compare) as before.
- Around line 62-75: The verifier's normalizeClickhouseValue currently only
handles JSON and dates; update it to also handle the types declared in
CLICKHOUSE_COLUMN_NORMALIZERS by mirroring the write-path normalization in
external-db-sync.ts (lines ~1234-1245): for "boolean" convert ClickHouse
boolean-like values ("0"/"1", "false"/"true", 0/1) to true/false; for
"nullable_boolean" treat "null" or null/undefined as null and otherwise convert
boolean-like values to true/false; for "bigint" normalize numeric or string
integer values to the same numeric representation used by the write path (e.g.,
Number(value) or BigInt if the write path uses BigInt). Implement these branches
inside normalizeClickhouseValue (use the columnType parameter) so
ClickHouse-side normalization matches the Postgres-side normalization logic.

---

Nitpick comments:
In `@apps/backend/scripts/verify-data-integrity/clickhouse-sync-verifier.ts`:
- Around line 152-153: The code silently skips mappings when
mapping.internalDbFetchQueries.clickhouse is missing; change the check to an
explicit null/undefined check (use fetchQuery == null) and throw a descriptive
Error instead of returning so the verifier fails loudly (include mapping
identifier in the message). Locate the fetchQuery assignment and the conditional
around mapping.internalDbFetchQueries.clickhouse, replace the early return with
throwing an Error that names the mapping (e.g., mapping.name or mapping.id) and
mentions the missing internalDbFetchQueries.clickhouse entry.

In `@apps/backend/src/lib/external-db-sync.ts`:
- Around line 1187-1189: The code derives targetTable with
tableName.split('.').pop()! which uses a non-null assertion; replace that with
an explicit assertion by calling assertNonEmptyString on the derived value to
ensure the invariant "parsed table name is a non-empty string"; locate the
snippet that creates targetTable and change it to compute the value then call
assertNonEmptyString(targetTable, 'targetTable') (or equivalent) before using it
with CLICKHOUSE_COLUMN_NORMALIZERS to match the file's validation style.
- Around line 1107-1167: The CLICKHOUSE_COLUMN_NORMALIZERS map is typed too
loosely as Record<string,...>, so misspelled or mismatched mapping keys can
bypass normalization; change its type to Partial<Record<MappingTargetTable,
Record<string,'json'|'boolean'|'nullable_boolean'|'bigint'>>> (use the
MappingTargetTable union) and update any callers to use the canonical
mapping.targetTable key when looking up normalizers (not the mapping object key)
so both the write path and the verifier use the same key shape; run the
TypeScript build to fix any call-site errors and adjust code to satisfy the
stricter type (e.g., change verifier lookup to
CLICKHOUSE_COLUMN_NORMALIZERS[mapping.targetTable]).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 92420d89-684a-486b-835b-e8090d52b59c

📥 Commits

Reviewing files that changed from the base of the PR and between a54ca54 and 15717fb.

📒 Files selected for processing (2)
  • apps/backend/scripts/verify-data-integrity/clickhouse-sync-verifier.ts
  • apps/backend/src/lib/external-db-sync.ts

Comment on lines +28 to +35
function compareRows(a: Record<string, unknown>, b: Record<string, unknown>, sortKeys: string[]): number {
for (const key of sortKeys) {
const aVal = String(a[key] ?? "");
const bVal = String(b[key] ?? "");
if (aVal < bVal) return -1;
if (aVal > bVal) return 1;
}
return 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't collapse missing sort keys to "".

If a SORT_KEYS entry drifts from the actual row shape, a[key] ?? "" / b[key] ?? "" makes every missing value compare equal, so the verifier can line up unrelated rows and report bogus diffs instead of failing at the bad key definition.

🛠️ Minimal fix
 function compareRows(a: Record<string, unknown>, b: Record<string, unknown>, sortKeys: string[]): number {
   for (const key of sortKeys) {
-    const aVal = String(a[key] ?? "");
-    const bVal = String(b[key] ?? "");
+    if (!(key in a) || !(key in b)) {
+      throw new StackAssertionError(`Missing sort key ${key} while ordering verifier rows.`);
+    }
+    const aVal = String(a[key]);
+    const bVal = String(b[key]);
     if (aVal < bVal) return -1;
     if (aVal > bVal) return 1;
   }
   return 0;
 }

As per coding guidelines "Fail early, fail loud; fail fast with an error instead of silently continuing".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/scripts/verify-data-integrity/clickhouse-sync-verifier.ts`
around lines 28 - 35, The compareRows function currently collapses missing sort
keys to "" which masks schema drift; instead, in compareRows(validate that for
each key in sortKeys both a and b actually have the key (use
Object.prototype.hasOwnProperty.call or key in ...) and if a key is missing on
either row throw a descriptive Error naming the missing key and which row (a or
b) so the verifier fails fast; after validating presence, continue with the
string comparison logic (convert values to strings and compare) as before.

Comment on lines +40 to +42
if (columnType === "json") {
// Postgres returns parsed JS values for jsonb columns; always stringify for consistent comparison
return JSON.stringify(value);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stringifying JSON here defeats the structural scrub.

deepEqual() only strips nulls and empty objects from structured values. After both sides are flattened to strings, JSON columns like {a:null} vs {} still mismatch even though stripNullsAndEmpties() is supposed to normalize that ClickHouse behavior.

Also applies to: 64-67

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/scripts/verify-data-integrity/clickhouse-sync-verifier.ts`
around lines 40 - 42, The code currently stringifies JSON columns in the
columnType === "json" branch which prevents stripNullsAndEmpties() and
deepEqual() from normalizing structural differences; instead, return a
structured JS value so the scrub can operate: if the incoming value is a string
try JSON.parse and return the resulting object, if it's already an object return
it unchanged, and only fallback to stringifying on parse error; ensure the same
logic is applied wherever columnType === "json" is handled (the other occurrence
noted) so stripNullsAndEmpties() and deepEqual() run on actual objects rather
than flattened strings.

Comment on lines +62 to +75
function normalizeClickhouseValue(value: unknown, columnType: string | undefined): unknown {
if (value === null || value === undefined) return null;
if (columnType === "json") {
// ClickHouse stores null JSON as the literal string "null"
if (value === "null") return null;
return typeof value === "string" ? value : JSON.stringify(value);
}
// For dates (ClickHouse returns as string like "2024-01-01 00:00:00.000" in UTC)
if (typeof value === "string" && /^\d{4}-\d{2}-\d{2}[ T]\d{2}:\d{2}:\d{2}/.test(value)) {
// Append 'Z' to force UTC interpretation if no timezone indicator present
const dateStr = value.includes("Z") || value.includes("+") ? value : value.replace(" ", "T") + "Z";
return new Date(dateStr).getTime();
}
return value;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The ClickHouse-side normalizer only handles JSON here.

CLICKHOUSE_COLUMN_NORMALIZERS also declares boolean, nullable_boolean, and bigint, but normalizeClickhouseValue() ignores all three. That leaves mappings such as team_invitations, email_outboxes, and refresh_tokens comparing normalized Postgres values against raw ClickHouse values. The write path in apps/backend/src/lib/external-db-sync.ts:1234-1245 already normalizes these types, so the verifier should mirror that.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/scripts/verify-data-integrity/clickhouse-sync-verifier.ts`
around lines 62 - 75, The verifier's normalizeClickhouseValue currently only
handles JSON and dates; update it to also handle the types declared in
CLICKHOUSE_COLUMN_NORMALIZERS by mirroring the write-path normalization in
external-db-sync.ts (lines ~1234-1245): for "boolean" convert ClickHouse
boolean-like values ("0"/"1", "false"/"true", 0/1) to true/false; for
"nullable_boolean" treat "null" or null/undefined as null and otherwise convert
boolean-like values to true/false; for "bigint" normalize numeric or string
integer values to the same numeric representation used by the write path (e.g.,
Number(value) or BigInt if the write path uses BigInt). Implement these branches
inside normalizeClickhouseValue (use the columnType parameter) so
ClickHouse-side normalization matches the Postgres-side normalization logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants