Syncs
Using CloudQuery Platform? See Platform Syncs for how the platform manages write modes, incremental state, and table views automatically.
When you run cloudquery sync <config>, the CloudQuery CLI fetches data from all the source integrations matched by the config and delivers it to the matched destination integrations. This might mean fetching data from AWS, GCP and Azure and delivering it to PostgreSQL, or it could mean fetching data from AWS, Cloudflare and GitLab and delivering it to BigQuery, S3 and MySQL. It all depends on the configuration provided, and there is a near-endless array of possible combinations that grows every time a new source or destination is created. (Configuration is described in the Configuration section.)
Browse all available source and destination integrations on the CloudQuery Hub.
CloudQuery streams data to the destination as it arrives. As soon as data is received for a source integration resource, it is delivered to the destination integration. Destination integrations may batch writes for performance reasons, but generally data will be delivered to the destination as the sync progresses.
Table Sync Modes
Table syncs come in two flavors: full and incremental. A single cloudquery sync command invocation can combine both types, and which type is used for a particular table depends on the table definition.
Full Table Syncs
This is the normal mode of operation for most tables. For tables in this mode, the CLI fetches a snapshot of all data from the corresponding APIs on every sync. Depending on the destination write mode, the data is then appended (write_mode: append), overwritten while keeping stale rows from previous syncs (write_mode: overwrite) or overwritten and rows from previous syncs deleted at the end of the sync (write_mode: overwrite-delete-stale). Learn more about schema migrations and how CloudQuery handles schema changes.
Incremental Table Syncs
Some APIs lend themselves to being synced incrementally. Rather than fetch all past data on every sync, an incremental table will only fetch data that has changed since the last sync. This is done by storing some metadata in a state backend. The metadata is known as a cursor, and it marks where the last sync ended, so that the next sync can resume from the same point. Incremental syncs can be vastly more efficient than full syncs, especially for tables with large amounts of data. This is because only the data that’s changed since the last sync needs to be retrieved, and in many cases this is a small subset of the overall dataset.
Incremental tables are always clearly marked as “incremental” in integration table documentation, along with an indication of which columns are used for the value of the cursor. Because they use state, incremental tables require a state backend configured via backend_options in the source spec:
kind: source
spec:
# ...
backend_options:
table_name: cq_state_aws
connection: "@@plugins.postgresql.connection"For more details, see Managing Incremental Tables.
Destination Modes
Destinations have three configuration modes that affect how syncs write data. These are set in the destination spec, not per-sync.
Write Mode
Controls what happens to existing data when new data arrives. Defaults to overwrite-delete-stale.
overwrite-delete-stale- new data replaces existing data, and rows from previous syncs that are no longer present in the source are deleted after the sync completes.overwrite- new data replaces existing data, but stale rows from previous syncs are kept.append- new rows are added alongside existing data from previous syncs. No data is deleted or replaced.
Migrate Mode
Controls how schema changes (new columns, type changes) are handled when a source integration updates its table definitions. Defaults to safe.
safe- only backward-compatible schema changes are applied (adding new columns, for example). Changes that would require dropping and recreating a table are skipped to avoid data loss.forced- all schema changes are applied, including destructive ones like dropping and recreating tables. This is useful when you’re using transformers that modify the schema and need the destination to match.
PK Mode
Controls how primary keys are set on destination tables. Defaults to default.
default- uses the primary keys defined by the source integration.cq-id-only- uses only the_cq_idcolumn as the primary key. This can help when the source-defined primary keys cause conflicts or when you want to preserve all rows including duplicates.
Scaling Syncs
For large cloud estates, you can split syncs across multiple processes or machines using sharding. See Running Syncs in Parallel for details on how to distribute sync workloads.
Next Steps
- AWS to PostgreSQL guide - follow a complete AWS to PostgreSQL sync from setup to querying data
- Configuration - set up source and destination configuration files for your first sync
- Managing Incremental Tables - configure state backends and cursors for incremental syncs
- Schema Migrations - understand how CloudQuery handles schema changes between syncs
- Running Syncs in Parallel - shard syncs across multiple processes for large cloud estates
- Performance Tuning - optimize sync speed and resource usage
- Monitoring with OpenTelemetry - track sync progress and performance
- Destination integration reference - configure write mode, migrate mode, and PK mode
cloudquery syncCLI reference - full command-line options and flags- Browse available integrations on the CloudQuery Hub