Skip to Content

Syncs

Using CloudQuery Platform? See Platform Syncs for how the platform manages write modes, incremental state, and table views automatically.

When you run cloudquery sync <config>, the CloudQuery CLI fetches data from all the source integrations matched by the config and delivers it to the matched destination integrations. This might mean fetching data from AWS, GCP and Azure and delivering it to PostgreSQL, or it could mean fetching data from AWS, Cloudflare and GitLab and delivering it to BigQuery, S3 and MySQL. It all depends on the configuration provided, and there is a near-endless array of possible combinations that grows every time a new source or destination is created. (Configuration is described in the Configuration section.)

Browse all available source and destination integrations on the CloudQuery Hub.

CloudQuery streams data to the destination as it arrives. As soon as data is received for a source integration resource, it is delivered to the destination integration. Destination integrations may batch writes for performance reasons, but generally data will be delivered to the destination as the sync progresses.

Table Sync Modes

Table syncs come in two flavors: full and incremental. A single cloudquery sync command invocation can combine both types, and which type is used for a particular table depends on the table definition.

Full Table Syncs

This is the normal mode of operation for most tables. For tables in this mode, the CLI fetches a snapshot of all data from the corresponding APIs on every sync. Depending on the destination write mode, the data is then appended (write_mode: append), overwritten while keeping stale rows from previous syncs (write_mode: overwrite) or overwritten and rows from previous syncs deleted at the end of the sync (write_mode: overwrite-delete-stale). Learn more about schema migrations and how CloudQuery handles schema changes.

Incremental Table Syncs

Some APIs lend themselves to being synced incrementally. Rather than fetch all past data on every sync, an incremental table will only fetch data that has changed since the last sync. This is done by storing some metadata in a state backend. The metadata is known as a cursor, and it marks where the last sync ended, so that the next sync can resume from the same point. Incremental syncs can be vastly more efficient than full syncs, especially for tables with large amounts of data. This is because only the data that’s changed since the last sync needs to be retrieved, and in many cases this is a small subset of the overall dataset.

Incremental tables are always clearly marked as “incremental” in integration table documentation, along with an indication of which columns are used for the value of the cursor. Because they use state, incremental tables require a state backend configured via backend_options in the source spec:

kind: source spec: # ... backend_options: table_name: cq_state_aws connection: "@@plugins.postgresql.connection"

For more details, see Managing Incremental Tables.

Destination Modes

Destinations have three configuration modes that affect how syncs write data. These are set in the destination spec, not per-sync.

Write Mode

Controls what happens to existing data when new data arrives. Defaults to overwrite-delete-stale.

  • overwrite-delete-stale - new data replaces existing data, and rows from previous syncs that are no longer present in the source are deleted after the sync completes.
  • overwrite - new data replaces existing data, but stale rows from previous syncs are kept.
  • append - new rows are added alongside existing data from previous syncs. No data is deleted or replaced.

Migrate Mode

Controls how schema changes (new columns, type changes) are handled when a source integration updates its table definitions. Defaults to safe.

  • safe - only backward-compatible schema changes are applied (adding new columns, for example). Changes that would require dropping and recreating a table are skipped to avoid data loss.
  • forced - all schema changes are applied, including destructive ones like dropping and recreating tables. This is useful when you’re using transformers that modify the schema and need the destination to match.

PK Mode

Controls how primary keys are set on destination tables. Defaults to default.

  • default - uses the primary keys defined by the source integration.
  • cq-id-only - uses only the _cq_id column as the primary key. This can help when the source-defined primary keys cause conflicts or when you want to preserve all rows including duplicates.

Scaling Syncs

For large cloud estates, you can split syncs across multiple processes or machines using sharding. See Running Syncs in Parallel for details on how to distribute sync workloads.

Next Steps

Last updated on