Skip to content

fix(api): exclude deleted score versions from scores count#13615

Open
Epochex wants to merge 2 commits into
langfuse:mainfrom
Epochex:fix/public-scores-count-deleted
Open

fix(api): exclude deleted score versions from scores count#13615
Epochex wants to merge 2 commits into
langfuse:mainfrom
Epochex:fix/public-scores-count-deleted

Conversation

@Epochex
Copy link
Copy Markdown

@Epochex Epochex commented May 13, 2026

Summary

Fixes #13559.

This updates the public scores list/count query path to compute both data and meta.totalItems from the same latest-version score view:

  • dedupe score versions with ORDER BY s.timestamp desc, s.event_ts desc LIMIT 1 BY s.id, s.project_id
  • filter is_deleted = 0 only after that dedupe step, so ReplacingMergeTree tombstones do not allow older live versions to reappear
  • reuse the same versioned query shape for the data and count paths
  • add regression coverage for a live score version followed by a tombstone, plus one visible score, asserting both response data and meta.totalItems exclude the deleted score

Review follow-up

  • Addressed Greptile's event_ts tiebreaker note in a32351f.
  • Added v1 traces_only dataset-run score count coverage in a32351f to document the count/data alignment.

Testing

  • pnpm.cmd exec prettier --check web/src/features/public-api/server/scores.ts web/src/__tests__/server/scores-api-v2.servertest.ts
  • git diff --check
  • pnpm.cmd --filter web typecheck
  • pnpm.cmd --filter web lint
  • Targeted server test: blocked locally because another project is bound to localhost:3000, while the existing test helper hardcodes that port
  • Maintainer CI: waiting for protected fork workflow approval (all-ci-passed, spelling, zizmor are not reported yet)

Greptile Summary

This PR fixes a bug where soft-deleted scores (tombstoned via is_deleted = 1) could still appear in the public scores API when ClickHouse's background ReplacingMergeTree merge had not yet run. The fix wraps the score query in a subquery that first deduplicates rows with ORDER BY timestamp desc, event_ts desc LIMIT 1 BY id, project_id, then filters is_deleted = 0 in the outer query — ensuring tombstones can never resurrect older live versions.

  • buildPublicApiScoresVersionedQuery is extracted to share the same deduplicated view between the data and count code paths, replacing two structurally divergent queries.
  • The traces_only count path silently gains a dataset_run_id IS NULL guard (the old count query had only session_id IS NULL), bringing count semantics in line with the data query.
  • A regression test is added that inserts a live version followed by a tombstone for the same score ID, asserts the deleted score is absent from both data and meta.totalItems, and verifies the remaining visible score is returned correctly.

Confidence Score: 4/5

The change is safe to merge; it correctly addresses the tombstone resurrection bug and unifies the data and count query paths without introducing regressions in the primary score retrieval logic.

The core deduplication approach (subquery with LIMIT 1 BY followed by outer is_deleted = 0 filter) is correct for ClickHouse ReplacingMergeTree. The two observations — inner IN subquery lacking event_ts in its ORDER BY, and the silent behavioral change to traces_only count — are both non-blocking: the former is a pre-existing inconsistency that does not affect result correctness, and the latter is actually a bug fix that aligns count semantics with the data query. The regression test is logically sound but the PR author notes it could not be verified locally against the actual service due to a port conflict.

web/src/features/public-api/server/scores.ts warrants a close read on the traces_only count behavior change; callers that depend on dataset-run scores being included in that count will now see a different totalItems.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["API Request\n/api/public/v2/scores"] --> B["buildPublicApiScoresVersionedQuery\n(inner subquery)"]
    B --> C["SELECT cols FROM scores s\nLEFT JOIN traces t (if needed)\nWHERE project_id AND filters"]
    C --> D["ORDER BY timestamp desc, event_ts desc\nLIMIT 1 BY id, project_id\n(deduplicate — picks latest version)"]
    D --> E["Outer query wrapper"]
    E --> F["WHERE is_deleted = 0\n(filter tombstones after dedup)"]
    F --> G{"query type?"}
    G -->|"data"| H["ORDER BY timestamp desc, event_ts desc\nLIMIT n OFFSET m\n(paginate)"]
    G -->|"count"| I["SELECT count()"]
    H --> J["convertClickhouseScoreToDomain\n+ convertScoreToPublicApi"]
    I --> K["meta.totalItems"]
    J --> L["response.data"]

    style D fill:#f9f,stroke:#333
    style F fill:#9f9,stroke:#333
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
web/src/features/public-api/server/scores.ts:107-121
**Inner IN subquery deduplicates without `event_ts` tiebreaker**

The inner `SELECT … LIMIT 1 BY s.id, s.project_id` inside the `IN (…)` clause orders only by `s.timestamp desc`, omitting `s.event_ts desc`. The outer deduplication (line 127–129) uses both fields. When two versions of the same score share an identical `timestamp` but differ in `event_ts`, the inner and outer deduplications may pick different "latest" rows. Because the inner subquery only contributes `(trace_id, project_id)` to the `IN` set, and both versions share the same `trace_id`, correctness is preserved in all known cases — but the inconsistency is a latent gap if the semantics of the IN set are ever widened. This pattern was present before this PR.

### Issue 2 of 2
web/src/features/public-api/server/scores.ts:289-299
**`traces_only` count now also excludes `dataset_run_id IS NOT NULL` scores**

The old `_handleGetScoresCountForPublicApi` query applied only `AND s.session_id IS NULL` for the `traces_only` scope, while the data query applied `AND s.session_id IS NULL AND s.dataset_run_id IS NULL`. By sharing `buildPublicApiScoresVersionedQuery`, both paths now apply both conditions. This silently changes the count for callers using `scoreScope = "traces_only"` when dataset-run scores exist in the project. The change is more correct, but it is an undocumented behavioral delta that may shift `meta.totalItems` for existing integrations.

Reviews (1): Last reviewed commit: "fix(api): exclude deleted score versions..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. feat-scores labels May 13, 2026
Comment thread web/src/features/public-api/server/scores.ts
Comment thread web/src/features/public-api/server/scores.ts
@Epochex
Copy link
Copy Markdown
Author

Epochex commented May 13, 2026

Updated in a32351f: aligned the inner score-version dedupe with the outer query by ordering on timestamp and event_ts, and added v1 coverage showing trace-only score counts exclude dataset-run scores just like the returned data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat-scores size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: /api/public/v2/scoresmeta.totalItems still overcounts (deleted-row residue; follow-up to #6396)

1 participant