Skip to content

Add function for asymptotic confidence sequences for bandit feedback#3998

Merged
amishler merged 158 commits intomainfrom
alan/bandits-confidence-sequences
Oct 20, 2025
Merged

Add function for asymptotic confidence sequences for bandit feedback#3998
amishler merged 158 commits intomainfrom
alan/bandits-confidence-sequences

Conversation

@amishler
Copy link
Member

No description provided.

virajmehta and others added 30 commits September 26, 2025 11:23
Two constraint bugs:
- For the simplex constraint, had 1s everywhere instead of just at indices corresponding to the weights
- Signs were backwards in the SOCP constraints
@amishler amishler marked this pull request as ready for review October 17, 2025 16:10
@virajmehta virajmehta enabled auto-merge October 20, 2025 14:57
@virajmehta virajmehta added this pull request to the merge queue Oct 20, 2025
github-merge-queue bot pushed a commit that referenced this pull request Oct 20, 2025
…3998)

* removed variant disabling from prepare_candidate_variants

* wip

* wip

* set up new variant config loading

* refactored initialization to set up samplers

* prod implementation seems correct, need to refactor tests too

* forgot a merge

* refactored tests into `experimentation`

* small fix to `prepare_candidate_variants`

* improved error handling for experimentation

* fixed tag version in experimentation

* refactored VariantSampler trait to be simpler

* fixed clippy

* cleanup

* fixed typing issues

* added test that samples from config

* config test should sample many times

* Add draft function for estimating optimal sampling probabilities

* Add function to check stopping condition

* Fix constraint bugs, set solver to non-verbose.

Two constraint bugs:
- For the simplex constraint, had 1s everywhere instead of just at indices corresponding to the weights
- Signs were backwards in the SOCP constraints

* Add guard for edge case of equal means and variances

* Add unit tests

* Add TODOs to comments

* added a comment

* Add test cases with known solutions

* Add test cases to check_stopping

* added more informative comments

* added validation

* Change function signatures, refactor constraint construction.

Change functions to accept vectors of FeedbackByVariants structs,
rather than a struct of vectors. Construct the quadratic coefficient
P matrix directly as a sparse matrix rather than by creating a dense
matrix and then converting it to sparse format. Update tests to match
new signatures.

* Removed unneeded argument struct

* refactored inference handler to pull infer_variant into a separate function

* refactored batch handler as well

* short circuited inference in the pinned / dynamic case

* fixed failing tests due to error handling improvement

* run merge queue tests

* removed merge queue tests

* wip

* set up config

* sketch is done

* added spawn

* wip

* added config loading logic

* Return probabilities in hashmap with variant names

* wip

* oops

* Slight change to avoid String cloning

* wip

* pulled sampling into helper function

* added postgres handling

* built bindings

* forgot a file

* Refactor function args into a struct, add tie handling for leader arm, add docstrings

* Refactor to use struct for function arg, add docstrings

* Rename ridge_variance to variance_floor for clarity

* Add TODO for choosing epsilon

* Log warning when arms are tied

* Rename arg and error structs with full function name

* Raise floor on pairwise info rate to avoid degeneracy, add tests to catch degeneracy

* Add comprehensive unit tests, log info and warnings about experiment state

* Add arguments validation for track and stop config

* Make sleep duration configurable

* Add integration test files

* Add helper functions and initial tests

* Clean up imports

* Add unit tests for convergence of estimated optimal probabilities.

As samples means and variances converge, the estimated optimal probabilities
should converge to the true optimum. This convergence may be nonmonotone in
the convergence of the sample statistics due to the nonlinear optimization
problem which is sensitive to the ordering of the sample means, so we average
over multiple random runs to yield monotonicity with high probability.

* wip on integration tests

* Fix bug in sampling behavior in NurseryAndBandits state.

sample_with_probabilities() in this state required a fresh `uniform_sample`,
but this branch of the code was only being reached when `uniform_sample` was
>= `nursery_probability`, leading to incorrect sampling.

* Filter out feedback variants so they don't enter bandit experiment.

Also fix test that was failing due to previous bugfix.

* Create test helper to build embedded gateway with postgres and clean clickhouse database

* Add comprehensive integration tests

* made migrations automatic in test docker composes

* removed stray changes

* wip

* fixed issue with entrypoint

* fixed issue with file write

* fixed buildkite CI flow

* fixed chc test

* fixed database names

* fixed typing

* changed database name to tensorzero_e2e_tests in Python to match previous behavior

* Remove tests of test helper functions

* Fix incorrect merge conflict resolution

* Move static weights sampling function and accompanying tests

* Move import to top of file, remove unneeded line

* Add comments and docstrings

* Disable feedback validation to avoid test failures when feedback precedes inference logging

* Remove convergence tests, remove global locks, enable parallel test runs.

Convergence tests require require too much parallelism for clickhouse and
postgres to handle, so we rely instead on unit tests in `estimate_optimal_probabilities.rs`.
The global locks are no longer necessary due to a change elsewhere in the repo, so
now tests can be run concurrently.

* Add support for optimize='min' in Track-and-Stop

* Remove test for optimize=min direction for now since that's not currently handled

* Change epsilon and delta test structure to speed them up

* Set global constant for sleep period when spawning a new client

* Revert changes in test_helpers

* Remove 'no_stopping' test: takes too long and this functionality is essentially already tested elsewhere

* Make VariantSampler setup argument generic for future use

* Change lifetime declaration for older version of clippy

* Add test for optimize=min direction

* Add unit tests for optimize=min direction

* wip: add sql query for variant feedback time series

* Change to supported clickhouse function name

* Change period_start to period_end, add comments for clarity

* Add tests for timeseries sql query

* Rebuild typescript bindings

* Initial implementation of asymptotic confidence sequences

* Fix CI clippy errors

* Fix field name: period_end -> period_start

* Make asymptotic cs computation correct and efficient

* Remove in-progress work that was accidentally included

* Wrap return type in Result, compute asympCS automatically when retrieving feedback time series

* Simplify sql query

* Fix handling of optional rho

* Added a join-free query to compute cumulative statistics (#4001)

* fixed performance issue in time series sql query

* fixed formatting issue

* fixed issue with sorting groupArray

* Change to parametric sql query

* Remove old commented out sql query

* Update tests to only expect data in periods where variants have new data

* Change time period to support aggregation by minute, hour, day, week, month

* Remove unnecessary to_string() calls

* Build new node bindings

* Build new TypeScript bindings

* Update old argument name

* Rename get_feedback_timeseries to get_cumulative_feedback_timeseries for clarity

* Rename FeedbackTimeSeriesPoint to CumulativeFeedbackTimeSeriesPoint for clarity

* Small tweak to avoid cloning

---------

Co-authored-by: Viraj Mehta <viraj@tensorzero.com>
Co-authored-by: Viraj Mehta <virajmehta@users.noreply.github.com>
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 20, 2025
@amishler amishler added this pull request to the merge queue Oct 20, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 20, 2025
@amishler amishler added this pull request to the merge queue Oct 20, 2025
Merged via the queue into main with commit ffe0247 Oct 20, 2025
31 checks passed
@amishler amishler deleted the alan/bandits-confidence-sequences branch October 20, 2025 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants