Add provider-proxy cache hit/miss stats to CI summaries#7057
Draft
AntoineToussaint wants to merge 11 commits intomainfrom
Draft
Add provider-proxy cache hit/miss stats to CI summaries#7057AntoineToussaint wants to merge 11 commits intomainfrom
AntoineToussaint wants to merge 11 commits intomainfrom
Conversation
Previously, `count_with_cost` was computed at query time using `CASE WHEN total_cost IS NOT NULL THEN inference_count ELSE 0 END`, which operated at (model, provider, minute) bucket granularity. If any inference in a bucket had cost, all inferences were counted. This adds a dedicated `count_with_cost` column computed at insert time: - ClickHouse: `countState(cost)` / `countMerge(count_with_cost)` - Postgres: `COUNT(cost)::BIGINT` Only inferences with non-null cost are now counted, giving per-inference accuracy. Closes #6574 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run a full refresh at the end of the migration so pre-existing rows get their count_with_cost populated instead of remaining NULL. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The migration 0051 test inserts 3 ModelInference rows (input_tokens: 100+200+300, output_tokens: 50+100+150) which flow into CumulativeUsage via materialized views. The existing assertions didn't account for these. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… totals Capture a baseline before inserting a new row, then assert that input_tokens increased by exactly 123 and output_tokens stayed the same. This way new migrations that insert test data won't break these checks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add `flush_model_provider_statistics()` to `ModelInferenceQueries` trait: - ClickHouse: sleeps 2s for the materialized view to process inserts - Postgres: calls `refresh_model_provider_statistics_incremental(full_refresh => TRUE)` - Default: no-op (for mocks and other backends) Refactor the 4 backend-specific cost aggregation tests into 2 shared tests using `make_db_test!`, so both ClickHouse and Postgres run identical logic. Also fixes a bug where the cross-minute Postgres test used `TimeWindow::Cumulative` instead of `TimeWindow::Minute`, making it test something different from the ClickHouse variant. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move stats flush logic from ModelInferenceQueries (production trait) into TestDatabaseHelpers (already test-only via cfg gate): - ClickHouse: flush_pending_writes() instead of sleeping 2s - Postgres: call refresh_model_provider_statistics_incremental Tests then use poll_for_result to wait until the expected data is visible, rather than sleeping a fixed duration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
list_evaluation_runs(1, 0) assumed the just-inserted run would always be the most recent globally, which fails under parallel test execution when another test inserts a run with a higher UUIDv7 at the same time. Fetch 100 runs and find the specific run by ID instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nference # Conflicts: # crates/tensorzero-core/src/db/clickhouse/migration_manager/migrations/migration_0051.rs
Adds a step to each workflow using the provider-proxy that counts cache hits and misses from the proxy logs and writes them to $GITHUB_STEP_SUMMARY. This gives visibility into cache effectiveness across all e2e test suites. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tracing::info!logs$GITHUB_STEP_SUMMARYso stats are visible on the workflow run pageTest plan
🤖 Generated with Claude Code