Skip to content

fix: improve ClickHouse CI reliability and reduce flakiness#7092

Open
AntoineToussaint wants to merge 9 commits intomainfrom
fix/clickhouse-ci-reliability
Open

fix: improve ClickHouse CI reliability and reduce flakiness#7092
AntoineToussaint wants to merge 9 commits intomainfrom
fix/clickhouse-ci-reliability

Conversation

@AntoineToussaint
Copy link
Copy Markdown
Member

Summary

  • Replace hardcoded sleep() calls with deterministic poll_for_result() in clickhouse.rs, inference_clickhouse.rs, and test_helpers.rs — tests now poll until data appears instead of hoping a fixed delay is enough
  • Increase default poll timeout from 5s to 30s — gives CI plenty of headroom under load without slowing passing tests (polling returns immediately on success)
  • Use unique ClickHouse databases per embedded gateway testmake_embedded_gateway_with_config_path() and friends now call create_unique_clickhouse_url() instead of sharing tensorzero_e2e_tests, preventing test interference
  • sleep_for_writes_to_be_visible() now calls flush_pending_writes() instead of sleeping 500ms, benefiting 26+ test files

Test plan

  • ClickHouse CI (batch_writes: true and batch_writes: false) passes without flaky failures
  • Migration tests in clickhouse.rs pass (materialized view polling replaces sleeps)
  • test_clickhouse_bulk_insert passes (polls for 10k inferences instead of sleeping 10s)
  • test_dummy_only_replicated_clickhouse passes with unique DB isolation
  • test_materialized_views_have_snapshot_hash passes (polls InferenceById MV)
  • Postgres test variants are unaffected (no-ops for flush/sleep)

🤖 Generated with Claude Code

- Replace hardcoded sleep() calls with deterministic poll_for_result() in
  clickhouse.rs, inference_clickhouse.rs, and test_helpers.rs
- Increase default poll timeout from 5s to 30s to handle CI load
- Use unique ClickHouse databases per embedded gateway test for isolation
- sleep_for_writes_to_be_visible() now calls flush_pending_writes()
  instead of sleeping 500ms, benefiting 26+ test files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
virajmehta
virajmehta previously approved these changes Mar 27, 2026
AntoineToussaint and others added 2 commits March 27, 2026 14:13
Tests like DICL optimizer and evaluation tests rely on fixture data
pre-loaded into the shared `tensorzero_e2e_tests` database. Using
unique databases per embedded gateway breaks these tests. The
`make_embedded_gateway_e2e_with_unique_db()` helper remains available
for tests that explicitly opt into isolation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
virajmehta
virajmehta previously approved these changes Mar 27, 2026
…DICL tests

- Evaluation tests: retry feedback calls that validate evaluator_inference_id
  exists in ClickHouse, since the evaluator inferences may not be visible yet
  after flush.
- DICL workflow test: create write_haiku inferences before launching
  optimization, instead of relying on other tests having run first.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
AntoineToussaint and others added 5 commits March 27, 2026 16:03
InputMessageContent::Text takes a Text struct, not a JSON value.
write_haiku expects Template input with topic argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The mock-optimization-tests don't have a provider-proxy for inference
calls, so creating inferences hits the real OpenAI API with a dummy key.
Switch to dataset-based data source (like run_workflow_test_case_with_dataset)
which creates datapoints directly without needing inference.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The materialized view ModelProviderStatisticsView may not be processing
inserts at this point in the migration sequence. Poll the source table
(ModelInference) instead, since the migration backfill reads from there.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants