Add function for asymptotic confidence sequences for bandit feedback by amishler · Pull Request #3998 · tensorzero/tensorzero

amishler · 2025-10-16T22:55:56Z

No description provided.

…viraj/experimentation-config

Two constraint bugs: - For the simplex constraint, had 1s everywhere instead of just at indices corresponding to the weights - Signs were backwards in the SOCP constraints

…dits

…nfidence-sequences

…zero/tensorzero into alan/bandits-confidence-sequences

internal/tensorzero-node/src/database.rs

tensorzero-core/src/db/clickhouse/select_queries.rs

tensorzero-core/src/db/mod.rs

…for clarity

…or clarity

…nfidence-sequences

…3998) * removed variant disabling from prepare_candidate_variants * wip * wip * set up new variant config loading * refactored initialization to set up samplers * prod implementation seems correct, need to refactor tests too * forgot a merge * refactored tests into `experimentation` * small fix to `prepare_candidate_variants` * improved error handling for experimentation * fixed tag version in experimentation * refactored VariantSampler trait to be simpler * fixed clippy * cleanup * fixed typing issues * added test that samples from config * config test should sample many times * Add draft function for estimating optimal sampling probabilities * Add function to check stopping condition * Fix constraint bugs, set solver to non-verbose. Two constraint bugs: - For the simplex constraint, had 1s everywhere instead of just at indices corresponding to the weights - Signs were backwards in the SOCP constraints * Add guard for edge case of equal means and variances * Add unit tests * Add TODOs to comments * added a comment * Add test cases with known solutions * Add test cases to check_stopping * added more informative comments * added validation * Change function signatures, refactor constraint construction. Change functions to accept vectors of FeedbackByVariants structs, rather than a struct of vectors. Construct the quadratic coefficient P matrix directly as a sparse matrix rather than by creating a dense matrix and then converting it to sparse format. Update tests to match new signatures. * Removed unneeded argument struct * refactored inference handler to pull infer_variant into a separate function * refactored batch handler as well * short circuited inference in the pinned / dynamic case * fixed failing tests due to error handling improvement * run merge queue tests * removed merge queue tests * wip * set up config * sketch is done * added spawn * wip * added config loading logic * Return probabilities in hashmap with variant names * wip * oops * Slight change to avoid String cloning * wip * pulled sampling into helper function * added postgres handling * built bindings * forgot a file * Refactor function args into a struct, add tie handling for leader arm, add docstrings * Refactor to use struct for function arg, add docstrings * Rename ridge_variance to variance_floor for clarity * Add TODO for choosing epsilon * Log warning when arms are tied * Rename arg and error structs with full function name * Raise floor on pairwise info rate to avoid degeneracy, add tests to catch degeneracy * Add comprehensive unit tests, log info and warnings about experiment state * Add arguments validation for track and stop config * Make sleep duration configurable * Add integration test files * Add helper functions and initial tests * Clean up imports * Add unit tests for convergence of estimated optimal probabilities. As samples means and variances converge, the estimated optimal probabilities should converge to the true optimum. This convergence may be nonmonotone in the convergence of the sample statistics due to the nonlinear optimization problem which is sensitive to the ordering of the sample means, so we average over multiple random runs to yield monotonicity with high probability. * wip on integration tests * Fix bug in sampling behavior in NurseryAndBandits state. sample_with_probabilities() in this state required a fresh `uniform_sample`, but this branch of the code was only being reached when `uniform_sample` was >= `nursery_probability`, leading to incorrect sampling. * Filter out feedback variants so they don't enter bandit experiment. Also fix test that was failing due to previous bugfix. * Create test helper to build embedded gateway with postgres and clean clickhouse database * Add comprehensive integration tests * made migrations automatic in test docker composes * removed stray changes * wip * fixed issue with entrypoint * fixed issue with file write * fixed buildkite CI flow * fixed chc test * fixed database names * fixed typing * changed database name to tensorzero_e2e_tests in Python to match previous behavior * Remove tests of test helper functions * Fix incorrect merge conflict resolution * Move static weights sampling function and accompanying tests * Move import to top of file, remove unneeded line * Add comments and docstrings * Disable feedback validation to avoid test failures when feedback precedes inference logging * Remove convergence tests, remove global locks, enable parallel test runs. Convergence tests require require too much parallelism for clickhouse and postgres to handle, so we rely instead on unit tests in `estimate_optimal_probabilities.rs`. The global locks are no longer necessary due to a change elsewhere in the repo, so now tests can be run concurrently. * Add support for optimize='min' in Track-and-Stop * Remove test for optimize=min direction for now since that's not currently handled * Change epsilon and delta test structure to speed them up * Set global constant for sleep period when spawning a new client * Revert changes in test_helpers * Remove 'no_stopping' test: takes too long and this functionality is essentially already tested elsewhere * Make VariantSampler setup argument generic for future use * Change lifetime declaration for older version of clippy * Add test for optimize=min direction * Add unit tests for optimize=min direction * wip: add sql query for variant feedback time series * Change to supported clickhouse function name * Change period_start to period_end, add comments for clarity * Add tests for timeseries sql query * Rebuild typescript bindings * Initial implementation of asymptotic confidence sequences * Fix CI clippy errors * Fix field name: period_end -> period_start * Make asymptotic cs computation correct and efficient * Remove in-progress work that was accidentally included * Wrap return type in Result, compute asympCS automatically when retrieving feedback time series * Simplify sql query * Fix handling of optional rho * Added a join-free query to compute cumulative statistics (#4001) * fixed performance issue in time series sql query * fixed formatting issue * fixed issue with sorting groupArray * Change to parametric sql query * Remove old commented out sql query * Update tests to only expect data in periods where variants have new data * Change time period to support aggregation by minute, hour, day, week, month * Remove unnecessary to_string() calls * Build new node bindings * Build new TypeScript bindings * Update old argument name * Rename get_feedback_timeseries to get_cumulative_feedback_timeseries for clarity * Rename FeedbackTimeSeriesPoint to CumulativeFeedbackTimeSeriesPoint for clarity * Small tweak to avoid cloning --------- Co-authored-by: Viraj Mehta <viraj@tensorzero.com> Co-authored-by: Viraj Mehta <virajmehta@users.noreply.github.com>

virajmehta and others added 30 commits September 26, 2025 11:23

removed variant disabling from prepare_candidate_variants

7834e29

wip

83fcd3c

wip

05208f7

set up new variant config loading

55dee80

implemented sampling for fallbacks

46f3624

refactored initialization to set up samplers

23e46cc

prod implementation seems correct, need to refactor tests too

d2c589f

Merge branch 'main' of https://github.com/tensorzero/tensorzero into …

5b0cdd4

…viraj/experimentation-config

forgot a merge

0a2f7db

refactored tests into experimentation

2390319

small fix to prepare_candidate_variants

3b30613

improved error handling for experimentation

7076011

fixed tag version in experimentation

b57877e

refactored VariantSampler trait to be simpler

39725b1

fixed clippy

1e8bce3

cleanup

891e89d

fixed typing issues

47bccd9

added test that samples from config

b9ffeb2

config test should sample many times

1cb43e9

Add draft function for estimating optimal sampling probabilities

62a28a6

Add function to check stopping condition

a701570

Fix constraint bugs, set solver to non-verbose.

9ddddac

Two constraint bugs: - For the simplex constraint, had 1s everywhere instead of just at indices corresponding to the weights - Signs were backwards in the SOCP constraints

Add guard for edge case of equal means and variances

7e09212

Add unit tests

cb0649b

Add TODOs to comments

b9a72c6

added a comment

89397a1

Add test cases with known solutions

3cf0315

Add test cases to check_stopping

a9c6e43

added more informative comments

a700f31

Merge branch 'main' of github.com:tensorzero/tensorzero into alan/ban…

0f71952

…dits

amishler added 3 commits October 17, 2025 11:49

Merge branch 'alan/bandits-viz-clickhouse-query' into alan/bandits-co…

d14333a

…nfidence-sequences

Merge branch 'main' into alan/bandits-viz-clickhouse-query

4708e94

Merge branch 'main' into alan/bandits-confidence-sequences

0315fc1

amishler marked this pull request as ready for review October 17, 2025 16:10

amishler added 2 commits October 17, 2025 12:38

Build new TypeScript bindings

71f8f3a

Merge branch 'alan/bandits-confidence-sequences' of github.com:tensor…

5e484dc

…zero/tensorzero into alan/bandits-confidence-sequences

virajmehta reviewed Oct 17, 2025

View reviewed changes

internal/tensorzero-node/src/database.rs Outdated Show resolved Hide resolved

virajmehta reviewed Oct 17, 2025

View reviewed changes

tensorzero-core/src/db/clickhouse/select_queries.rs Outdated Show resolved Hide resolved

virajmehta reviewed Oct 17, 2025

View reviewed changes

tensorzero-core/src/db/mod.rs Outdated Show resolved Hide resolved

amishler added 10 commits October 17, 2025 15:37

Update old argument name

0497c23

Rename get_feedback_timeseries to get_cumulative_feedback_timeseries …

f2568ee

…for clarity

Rename FeedbackTimeSeriesPoint to CumulativeFeedbackTimeSeriesPoint f…

8e487b2

…or clarity

Merge branch 'main' into alan/bandits-viz-clickhouse-query

82c3258

Merge branch 'alan/bandits-viz-clickhouse-query' into alan/bandits-co…

f5fb4b3

…nfidence-sequences

Merge branch 'main' into alan/bandits-viz-clickhouse-query

0d52789

Merge branch 'alan/bandits-viz-clickhouse-query' into alan/bandits-co…

b0ec133

…nfidence-sequences

Merge branch 'main' into alan/bandits-confidence-sequences

308d025

Merge branch 'main' into alan/bandits-confidence-sequences

f4317d4

Small tweak to avoid cloning

3b8a089

virajmehta approved these changes Oct 20, 2025

View reviewed changes

virajmehta enabled auto-merge October 20, 2025 14:57

virajmehta added this pull request to the merge queue Oct 20, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 20, 2025

amishler added this pull request to the merge queue Oct 20, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 20, 2025

amishler added this pull request to the merge queue Oct 20, 2025

Merged via the queue into main with commit ffe0247 Oct 20, 2025
31 checks passed

amishler deleted the alan/bandits-confidence-sequences branch October 20, 2025 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add function for asymptotic confidence sequences for bandit feedback#3998

Add function for asymptotic confidence sequences for bandit feedback#3998
amishler merged 158 commits intomainfrom
alan/bandits-confidence-sequences

amishler commented Oct 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

amishler commented Oct 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants