fix(api): exclude deleted score versions from scores count#13615
Open
Epochex wants to merge 2 commits into
Open
fix(api): exclude deleted score versions from scores count#13615Epochex wants to merge 2 commits into
Epochex wants to merge 2 commits into
Conversation
Author
|
Updated in a32351f: aligned the inner score-version dedupe with the outer query by ordering on timestamp and event_ts, and added v1 coverage showing trace-only score counts exclude dataset-run scores just like the returned data. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #13559.
This updates the public scores list/count query path to compute both
dataandmeta.totalItemsfrom the same latest-version score view:ORDER BY s.timestamp desc, s.event_ts desc LIMIT 1 BY s.id, s.project_idis_deleted = 0only after that dedupe step, so ReplacingMergeTree tombstones do not allow older live versions to reappearmeta.totalItemsexclude the deleted scoreReview follow-up
event_tstiebreaker note ina32351f.traces_onlydataset-run score count coverage ina32351fto document the count/data alignment.Testing
pnpm.cmd exec prettier --check web/src/features/public-api/server/scores.ts web/src/__tests__/server/scores-api-v2.servertest.tsgit diff --checkpnpm.cmd --filter web typecheckpnpm.cmd --filter web lintlocalhost:3000, while the existing test helper hardcodes that portall-ci-passed, spelling,zizmorare not reported yet)Greptile Summary
This PR fixes a bug where soft-deleted scores (tombstoned via
is_deleted = 1) could still appear in the public scores API when ClickHouse's background ReplacingMergeTree merge had not yet run. The fix wraps the score query in a subquery that first deduplicates rows withORDER BY timestamp desc, event_ts desc LIMIT 1 BY id, project_id, then filtersis_deleted = 0in the outer query — ensuring tombstones can never resurrect older live versions.buildPublicApiScoresVersionedQueryis extracted to share the same deduplicated view between the data and count code paths, replacing two structurally divergent queries.traces_onlycount path silently gains adataset_run_id IS NULLguard (the old count query had onlysession_id IS NULL), bringing count semantics in line with the data query.dataandmeta.totalItems, and verifies the remaining visible score is returned correctly.Confidence Score: 4/5
The change is safe to merge; it correctly addresses the tombstone resurrection bug and unifies the data and count query paths without introducing regressions in the primary score retrieval logic.
The core deduplication approach (subquery with LIMIT 1 BY followed by outer is_deleted = 0 filter) is correct for ClickHouse ReplacingMergeTree. The two observations — inner IN subquery lacking event_ts in its ORDER BY, and the silent behavioral change to traces_only count — are both non-blocking: the former is a pre-existing inconsistency that does not affect result correctness, and the latter is actually a bug fix that aligns count semantics with the data query. The regression test is logically sound but the PR author notes it could not be verified locally against the actual service due to a port conflict.
web/src/features/public-api/server/scores.ts warrants a close read on the traces_only count behavior change; callers that depend on dataset-run scores being included in that count will now see a different totalItems.
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A["API Request\n/api/public/v2/scores"] --> B["buildPublicApiScoresVersionedQuery\n(inner subquery)"] B --> C["SELECT cols FROM scores s\nLEFT JOIN traces t (if needed)\nWHERE project_id AND filters"] C --> D["ORDER BY timestamp desc, event_ts desc\nLIMIT 1 BY id, project_id\n(deduplicate — picks latest version)"] D --> E["Outer query wrapper"] E --> F["WHERE is_deleted = 0\n(filter tombstones after dedup)"] F --> G{"query type?"} G -->|"data"| H["ORDER BY timestamp desc, event_ts desc\nLIMIT n OFFSET m\n(paginate)"] G -->|"count"| I["SELECT count()"] H --> J["convertClickhouseScoreToDomain\n+ convertScoreToPublicApi"] I --> K["meta.totalItems"] J --> L["response.data"] style D fill:#f9f,stroke:#333 style F fill:#9f9,stroke:#333Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "fix(api): exclude deleted score versions..." | Re-trigger Greptile