Skip to content

clickhouse user sync#1159

Open
BilalG1 wants to merge 3 commits intoexternal-db-syncfrom
external-db-sync-clickhouse-default
Open

clickhouse user sync#1159
BilalG1 wants to merge 3 commits intoexternal-db-syncfrom
external-db-sync-clickhouse-default

Conversation

@BilalG1
Copy link
Contributor

@BilalG1 BilalG1 commented Feb 4, 2026

No description provided.

@vercel
Copy link

vercel bot commented Feb 4, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
stack-backend Ready Ready Preview, Comment Feb 5, 2026 2:38am
stack-dashboard Ready Ready Preview, Comment Feb 5, 2026 2:38am
stack-demo Ready Ready Preview, Comment Feb 5, 2026 2:38am
stack-docs Ready Ready Preview, Comment Feb 5, 2026 2:38am

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 4, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch external-db-sync-clickhouse-default

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 4, 2026

Greptile Overview

Greptile Summary

This PR extends the external database sync system to support ClickHouse user synchronization alongside the existing PostgreSQL support. The implementation adds comprehensive infrastructure for syncing user data to ClickHouse for analytics purposes.

Key Changes

  • Added ClickHouse user table schema with ReplacingMergeTree engine for deduplication and deletion handling via is_deleted flag
  • Implemented sync metadata tracking table (_stack_sync_metadata) to maintain per-tenancy, per-mapping sync state
  • Created row-level security policies isolating data by project_id and branch_id for multi-tenant safety
  • Added ClickHouse-specific sync functions with boolean normalization (converting boolean values to UInt8 0/1)
  • Extended configuration schema to support "clickhouse" database type and auto-inject ClickHouse config by default
  • Implemented status monitoring endpoint for tracking sync progress, backlog, and user table statistics
  • Added E2E test validating user sync with 2-minute timeout and polling mechanism

Architecture

The sync operates in batches of 1000 rows, pulling data from the internal PostgreSQL using $queryRawUnsafe with parameterized queries. The ClickHouse implementation mirrors the PostgreSQL pattern but includes ClickHouse-specific normalization for boolean fields. Both implementations handle user deletions by syncing deleted rows from the DeletedRow table.

The status route provides comprehensive monitoring including metadata tracking, backlog calculation (internal max sequence ID - last synced sequence ID), and user table statistics.

Confidence Score: 4/5

  • Safe to merge with minor style improvements possible
  • The implementation is solid with proper parameterization, error handling, and comprehensive tests. All ClickHouse queries use parameterized queries preventing injection. The sync logic properly handles batching, throttling, and metadata tracking. Score is 4 not 5 due to hardcoded boolean field normalization in pushRowsToClickhouse that may not scale to future mappings.
  • apps/backend/src/lib/external-db-sync.ts needs attention for the hardcoded field normalization

Important Files Changed

Filename Overview
apps/backend/scripts/clickhouse-migrations.ts Added users table schema, view, sync metadata table, and row-level security policies for ClickHouse
packages/stack-shared/src/config/db-sync-mappings.ts Added ClickHouse-specific schema and fetch queries for user sync with deletion tracking
apps/backend/src/lib/external-db-sync.ts Implemented ClickHouse sync logic with metadata tracking, boolean normalization, and batch processing

Sequence Diagram

sequenceDiagram
    participant Cron as Cron Job
    participant Sequencer as Sequencer API
    participant Poller as Poller API
    participant SyncEngine as External DB Sync Engine
    participant PG as Internal PostgreSQL
    participant CH as ClickHouse
    participant Status as Status API

    Note over Cron,CH: User Sync Flow

    Cron->>Sequencer: POST /external-db-sync/sequencer
    Sequencer->>PG: Update ProjectUser.sequenceId
    Sequencer->>PG: Update DeletedRow.sequenceId
    
    Cron->>Poller: POST /external-db-sync/poller
    Poller->>SyncEngine: syncExternalDatabases(tenancy)
    
    alt ClickHouse Database
        SyncEngine->>CH: getClickhouseLastSyncedSequenceId()
        CH-->>SyncEngine: lastSequenceId
        
        loop Batch Processing
            SyncEngine->>PG: SELECT users WHERE sequence_id > lastSequenceId
            PG-->>SyncEngine: rows (max 1000)
            SyncEngine->>SyncEngine: normalizeClickhouseBoolean()
            SyncEngine->>CH: INSERT INTO analytics_internal.users
            SyncEngine->>CH: INSERT INTO _stack_sync_metadata
        end
    end
    
    alt Postgres Database
        SyncEngine->>PG: SELECT last_synced_sequence_id FROM _stack_sync_metadata
        PG-->>SyncEngine: lastSequenceId
        
        loop Batch Processing
            SyncEngine->>PG: SELECT users WHERE sequence_id > lastSequenceId
            PG-->>SyncEngine: rows (max 1000)
            SyncEngine->>PG: UPSERT into external DB
        end
    end
    
    Note over Status,CH: Status Monitoring
    
    Status->>CH: Query _stack_sync_metadata
    CH-->>Status: metadata rows
    Status->>CH: Query users table stats
    CH-->>Status: user counts & timestamps
    Status-->>Cron: Sync status with backlog info
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant