Skip to content

Add provider-proxy cache hit/miss stats to CI summaries#7057

Draft
AntoineToussaint wants to merge 11 commits intomainfrom
provider-proxy-cache-stats
Draft

Add provider-proxy cache hit/miss stats to CI summaries#7057
AntoineToussaint wants to merge 11 commits intomainfrom
provider-proxy-cache-stats

Conversation

@AntoineToussaint
Copy link
Copy Markdown
Member

Summary

  • Adds a "Provider proxy cache stats" step to all CI workflows that use the provider-proxy
  • Counts cache hits and misses from the proxy's tracing::info! logs
  • Writes hit count, miss count, total, and hit rate to $GITHUB_STEP_SUMMARY so stats are visible on the workflow run page
  • Covers: live-tests, evaluation-tests, client-tests, inference-cache-tests, and ui-e2e-main

Test plan

  • Check that the step summary renders correctly after a CI run
  • Verify counts match the proxy logs

🤖 Generated with Claude Code

AntoineToussaint and others added 11 commits March 20, 2026 15:18
Previously, `count_with_cost` was computed at query time using
`CASE WHEN total_cost IS NOT NULL THEN inference_count ELSE 0 END`,
which operated at (model, provider, minute) bucket granularity. If any
inference in a bucket had cost, all inferences were counted.

This adds a dedicated `count_with_cost` column computed at insert time:
- ClickHouse: `countState(cost)` / `countMerge(count_with_cost)`
- Postgres: `COUNT(cost)::BIGINT`

Only inferences with non-null cost are now counted, giving per-inference
accuracy.

Closes #6574

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run a full refresh at the end of the migration so pre-existing rows
get their count_with_cost populated instead of remaining NULL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The migration 0051 test inserts 3 ModelInference rows (input_tokens:
100+200+300, output_tokens: 50+100+150) which flow into CumulativeUsage
via materialized views. The existing assertions didn't account for these.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… totals

Capture a baseline before inserting a new row, then assert that
input_tokens increased by exactly 123 and output_tokens stayed the same.
This way new migrations that insert test data won't break these checks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add `flush_model_provider_statistics()` to `ModelInferenceQueries` trait:
- ClickHouse: sleeps 2s for the materialized view to process inserts
- Postgres: calls `refresh_model_provider_statistics_incremental(full_refresh => TRUE)`
- Default: no-op (for mocks and other backends)

Refactor the 4 backend-specific cost aggregation tests into 2 shared tests
using `make_db_test!`, so both ClickHouse and Postgres run identical logic.
Also fixes a bug where the cross-minute Postgres test used `TimeWindow::Cumulative`
instead of `TimeWindow::Minute`, making it test something different from the
ClickHouse variant.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move stats flush logic from ModelInferenceQueries (production trait) into
TestDatabaseHelpers (already test-only via cfg gate):
- ClickHouse: flush_pending_writes() instead of sleeping 2s
- Postgres: call refresh_model_provider_statistics_incremental

Tests then use poll_for_result to wait until the expected data is visible,
rather than sleeping a fixed duration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
list_evaluation_runs(1, 0) assumed the just-inserted run would always be
the most recent globally, which fails under parallel test execution when
another test inserts a run with a higher UUIDv7 at the same time.

Fetch 100 runs and find the specific run by ID instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nference

# Conflicts:
#	crates/tensorzero-core/src/db/clickhouse/migration_manager/migrations/migration_0051.rs
Adds a step to each workflow using the provider-proxy that counts cache
hits and misses from the proxy logs and writes them to $GITHUB_STEP_SUMMARY.
This gives visibility into cache effectiveness across all e2e test suites.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AntoineToussaint AntoineToussaint marked this pull request as draft March 25, 2026 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant