Skip to content

Fix non-deterministic feedback deduplication in ClickHouse#7077

Open
AntoineToussaint wants to merge 2 commits intomainfrom
fix/clickhouse-feedback-ordering
Open

Fix non-deterministic feedback deduplication in ClickHouse#7077
AntoineToussaint wants to merge 2 commits intomainfrom
fix/clickhouse-feedback-ordering

Conversation

@AntoineToussaint
Copy link
Copy Markdown
Member

Summary

  • ClickHouse ROW_NUMBER() window functions for feedback deduplication used ORDER BY timestamp DESC, but timestamp is second-precision (materialized from UUIDv7). When multiple feedback records for the same target land in the same second, the ordering is undefined — causing flaky test failures.
  • Added id DESC as a tiebreaker to all 4 query variants (episode/inference × cumulative/windowed). Since id is a UUIDv7 with sub-millisecond precision, this guarantees deterministic latest-wins semantics.

Test plan

  • test_get_variant_performances_distinct_on_semantics_clickhouse should now pass deterministically
  • Existing ClickHouse endpoint tests pass in CI

🤖 Generated with Claude Code

The ROW_NUMBER() window function used `ORDER BY timestamp DESC` to pick
the latest feedback per target_id, but `timestamp` is a DateTime
(second-precision) column materialized from UUIDv7. When multiple
feedback records share the same second, the ordering is undefined,
causing flaky test failures (e.g. getting value 1.0 instead of 5.0).

Add `id DESC` as a tiebreaker — since `id` is a UUIDv7 with
sub-millisecond precision, this guarantees deterministic latest-wins
semantics. All four query variants (episode/inference × cumulative/windowed)
are fixed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
target_id,
value,
ROW_NUMBER() OVER (PARTITION BY target_id ORDER BY timestamp DESC) as rn
ROW_NUMBER() OVER (PARTITION BY target_id ORDER BY timestamp DESC, id DESC) as rn
Copy link
Copy Markdown
Member

@GabrielBianconi GabrielBianconi Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't doubled checked here but FYI:

ClickHouse stores UUIDs in Big-endian order. This means that UUIDv7s are not stored chronologically. If you ever want to iterate in order you need to use toUInt128 as the pkey

https://www.notion.so/tensorzerodotcom/ClickHouse-1bc7520bbad380cc8ce6cc3aa02b63cd?source=copy_link

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — updated all 4 window functions to use toUInt128(id) DESC instead of raw id DESC, consistent with how the rest of this file already orders UUIDv7s.

…ions

ClickHouse stores UUIDs in big-endian byte order, which does not
preserve chronological ordering for UUIDv7. Use toUInt128(id) as the
tiebreaker in ROW_NUMBER() window functions, consistent with the rest
of the codebase.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants