Skip to main content

Deduplication

In real life scenario it is hard to guarantee that site events never be duplicated. That may happen due to various reasons:

  • connectivity issues especially in mobile networks (device browser may repeat network request in cases of bad reception)
  • connection errors while interacting with the destination
  • client may wish to reprocess some part of events from the past to fix some data issues.
  • and others

duplicates in a data warehouse may cause various issues:

  • incorrect metrics calculation
  • incorrect attribution
  • incorrect user segmentation

That is why it is important to collect events in a way that prevents data duplication.

Jitsu provides a Deduplication feature that is enabled by default for all data warehouse connections.

How it works

For each destination Jitsu uses deduplication approach that is based on the destination capabilities.

To find out details about each destination please refer to the corresponding destination documentation in Destinations» Warehouses section.

E.g. for ClickHouse deduplication is built on top of ReplacingMergeTree engine.

How to enable

You can find Deduplication feature on the Connection editing page in the Advanced section:

Screenshot

It is enabled by default for all data warehouse connections.