Tags: forkgitss/tensorzero-tensorzero
Tags
Add UI elements for bandits (tensorzero#4069) * removed variant disabling from prepare_candidate_variants * wip * wip * set up new variant config loading * refactored initialization to set up samplers * prod implementation seems correct, need to refactor tests too * forgot a merge * refactored tests into `experimentation` * small fix to `prepare_candidate_variants` * improved error handling for experimentation * fixed tag version in experimentation * refactored VariantSampler trait to be simpler * fixed clippy * cleanup * fixed typing issues * added test that samples from config * config test should sample many times * Add draft function for estimating optimal sampling probabilities * Add function to check stopping condition * Fix constraint bugs, set solver to non-verbose. Two constraint bugs: - For the simplex constraint, had 1s everywhere instead of just at indices corresponding to the weights - Signs were backwards in the SOCP constraints * Add guard for edge case of equal means and variances * Add unit tests * Add TODOs to comments * added a comment * Add test cases with known solutions * Add test cases to check_stopping * added more informative comments * added validation * Change function signatures, refactor constraint construction. Change functions to accept vectors of FeedbackByVariants structs, rather than a struct of vectors. Construct the quadratic coefficient P matrix directly as a sparse matrix rather than by creating a dense matrix and then converting it to sparse format. Update tests to match new signatures. * Removed unneeded argument struct * refactored inference handler to pull infer_variant into a separate function * refactored batch handler as well * short circuited inference in the pinned / dynamic case * fixed failing tests due to error handling improvement * run merge queue tests * removed merge queue tests * wip * set up config * sketch is done * added spawn * wip * added config loading logic * Return probabilities in hashmap with variant names * wip * oops * Slight change to avoid String cloning * wip * pulled sampling into helper function * added postgres handling * built bindings * forgot a file * Refactor function args into a struct, add tie handling for leader arm, add docstrings * Refactor to use struct for function arg, add docstrings * Rename ridge_variance to variance_floor for clarity * Add TODO for choosing epsilon * Log warning when arms are tied * Rename arg and error structs with full function name * Raise floor on pairwise info rate to avoid degeneracy, add tests to catch degeneracy * Add comprehensive unit tests, log info and warnings about experiment state * Add arguments validation for track and stop config * Make sleep duration configurable * Add integration test files * Add helper functions and initial tests * Clean up imports * Add unit tests for convergence of estimated optimal probabilities. As samples means and variances converge, the estimated optimal probabilities should converge to the true optimum. This convergence may be nonmonotone in the convergence of the sample statistics due to the nonlinear optimization problem which is sensitive to the ordering of the sample means, so we average over multiple random runs to yield monotonicity with high probability. * wip on integration tests * Fix bug in sampling behavior in NurseryAndBandits state. sample_with_probabilities() in this state required a fresh `uniform_sample`, but this branch of the code was only being reached when `uniform_sample` was >= `nursery_probability`, leading to incorrect sampling. * Filter out feedback variants so they don't enter bandit experiment. Also fix test that was failing due to previous bugfix. * Create test helper to build embedded gateway with postgres and clean clickhouse database * Add comprehensive integration tests * made migrations automatic in test docker composes * removed stray changes * wip * fixed issue with entrypoint * fixed issue with file write * fixed buildkite CI flow * fixed chc test * fixed database names * fixed typing * changed database name to tensorzero_e2e_tests in Python to match previous behavior * Remove tests of test helper functions * Fix incorrect merge conflict resolution * Move static weights sampling function and accompanying tests * Move import to top of file, remove unneeded line * Add comments and docstrings * Disable feedback validation to avoid test failures when feedback precedes inference logging * Remove convergence tests, remove global locks, enable parallel test runs. Convergence tests require require too much parallelism for clickhouse and postgres to handle, so we rely instead on unit tests in `estimate_optimal_probabilities.rs`. The global locks are no longer necessary due to a change elsewhere in the repo, so now tests can be run concurrently. * Add support for optimize='min' in Track-and-Stop * Remove test for optimize=min direction for now since that's not currently handled * Change epsilon and delta test structure to speed them up * Set global constant for sleep period when spawning a new client * Revert changes in test_helpers * Remove 'no_stopping' test: takes too long and this functionality is essentially already tested elsewhere * Make VariantSampler setup argument generic for future use * Change lifetime declaration for older version of clippy * Add test for optimize=min direction * Add unit tests for optimize=min direction * wip: add sql query for variant feedback time series * Change to supported clickhouse function name * Change period_start to period_end, add comments for clarity * Add tests for timeseries sql query * Rebuild typescript bindings * Initial implementation of asymptotic confidence sequences * Fix CI clippy errors * Fix field name: period_end -> period_start * Make asymptotic cs computation correct and efficient * Remove in-progress work that was accidentally included * Wrap return type in Result, compute asympCS automatically when retrieving feedback time series * Simplify sql query * Fix handling of optional rho * Added a join-free query to compute cumulative statistics (tensorzero#4001) * fixed performance issue in time series sql query * fixed formatting issue * fixed issue with sorting groupArray * Change to parametric sql query * Remove old commented out sql query * Update tests to only expect data in periods where variants have new data * Change time period to support aggregation by minute, hour, day, week, month * Remove unnecessary to_string() calls * Build new node bindings * Build new TypeScript bindings * Expose estimate_optimal_probabilities function to node * Update old argument name * Rename get_feedback_timeseries to get_cumulative_feedback_timeseries for clarity * Rename FeedbackTimeSeriesPoint to CumulativeFeedbackTimeSeriesPoint for clarity * Small tweak to avoid cloning * Fix numeric types in integration tests * Add node test for getFeedbackByVariant * Remove some comments * Move <div> outside <p> so server-rendered html doesn't get discarded * Add pie chart with estimated optimal probabilities * Add Postgres to docker, add track-and-stop to fixtures * Change struct and function names to include TrackAndStop * Update function name to include TrackAndStop * Update track-and-stop pie chart to include probabilities for nursery variants * wip - add feedback time series chart * Replace TimeWindowUnit with TimeWindow * wip - add feedback timeseries chart * Change metric, add minutes and hours to dropdown * Remove useState for URL parameters Read metric_name and time granularities directly from searchParams instead of maintaining local state, ensuring URL is single source of truth. * Make variance field nullable to accommodate ClickHouse using sample variance * Add new fixture with feedback data distributed in time * Make pie chart fall back to uniform weights if estimated optimal sampling probabilities are undefined * Use memo and probability sorting to prevent needless pie chart re-rendering * Propagate cumulative values forward in time so feedback samples chart renders correctly * Remove year from x-axis labels on Hourly chart, remove unnecessary string normalization * Add chart title and dropdown menu width constraint * Revert docker file * Fix docker compose file (merged wrong version) * Add postgres to tensorzero-node ui test config * Pass postgres environment variable through * Cast feedback count to number for type consistency * Use pre-existing utility function to format chart ticks * Factor out date/time functionality out into new util functions * Remove unneeded exports, add docstrings, clean up sorting behavior * Remove default export and unnecessary error, clean up feedbackTimeSeriesPromise * Rename variable to something more informative * Factor out optimal probability computation into separate utility function * Fix typo in docstring --------- Co-authored-by: Viraj Mehta <viraj@tensorzero.com> Co-authored-by: Viraj Mehta <virajmehta@users.noreply.github.com>
Potential fix for code scanning alert no. 2: Code injection (tensorze… …ro#4011) * Potential fix for code scanning alert no. 2: Code injection Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Update --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: Gabriel Bianconi <GabrielBianconi@users.noreply.github.com>
Bump to 2025.10.5 (tensorzero#4039) * Bump to 2025.10.5 * Bump to 2025.10.5
Bump SGLang on Modal to fix flaky e2e tests (tensorzero#4009) * Bump SGLang on Modal to fix flaky e2e tests SGLang now emits many duplicate tool calls, instead of not emitting any tool calls at all. I've adjusted our tests to allow this for sglang. When running locally, the tests almost always pass on the first try (with the occasional retry), instead of the ~9 retries we were seeing on our daily cron job * Fix clippy * Change warmup url
Fix Playground tool name bug (tensorzero#3881) * Fix Playground tool name bug * Fix Playground tool name bug * Fix Playground tool name bug * Fix Playground tool name bug * Update ui/app/routes/api/tensorzero/inference.utils.tsx Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fix Playground tool name bug * fixed together API key in regen fixtures bot * try and fix credentials for regen fixtures * Regenerate ModelInferenceCache fixtures --------- Co-authored-by: Gabriel Bianconi <GabrielBianconi@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Viraj Mehta <viraj@tensorzero.com> Co-authored-by: TensorZero Bot <github-actions[bot]@users.noreply.github.com>
Implement Together SFT in UI (tensorzero#3847) * Move `formatProvider()` to a separate file for reuse * Expose gcp_vertex_gemini, add gemini-2.0-flash-lite-001 * Add tests to ensure configured providers are exposed * Expose Together AI in UI, added "togethercomputer/llama-2-7b-chat" * Expanded list of Together AI models * Expose Together AI in UI, added "togethercomputer/llama-2-7b-chat" * Expanded list of Together AI models * Add TOGETHER_BASE_URL to GitHub Actions & dockerfiles * Small cleanup * Add Together test * e2e test passes * removed .only from test * added env vars to ui e2e tests for together * removed stray TODO --------- Co-authored-by: Bret Hudson <bret@brethudson.com>
Bump to 2025.10.1 (tensorzero#3814) Co-authored-by: Gabriel Bianconi <GabrielBianconi@users.noreply.github.com>
Bump the rust-dependencies group across 1 directory with 3 updates (t… …ensorzero#3793) Bumps the rust-dependencies group with 3 updates in the / directory: [google-cloud-auth](https://github.com/googleapis/google-cloud-rust), [tree-sitter-python](https://github.com/tree-sitter/tree-sitter-python) and [tree-sitter-md](https://github.com/tree-sitter-grammars/tree-sitter-markdown). Updates `google-cloud-auth` from 0.23.0 to 1.0.0 - [Release notes](https://github.com/googleapis/google-cloud-rust/releases) - [Commits](https://github.com/googleapis/google-cloud-rust/commits/v1.0.0) Updates `tree-sitter-python` from 0.23.6 to 0.25.0 - [Release notes](https://github.com/tree-sitter/tree-sitter-python/releases) - [Commits](tree-sitter/tree-sitter-python@v0.23.6...v0.25.0) Updates `tree-sitter-md` from 0.3.2 to 0.5.1 - [Release notes](https://github.com/tree-sitter-grammars/tree-sitter-markdown/releases) - [Commits](tree-sitter-grammars/tree-sitter-markdown@v0.3.2...v0.5.1) --- updated-dependencies: - dependency-name: google-cloud-auth dependency-version: 1.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: rust-dependencies - dependency-name: tree-sitter-python dependency-version: 0.25.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: rust-dependencies - dependency-name: tree-sitter-md dependency-version: 0.5.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: rust-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Update chart font (tensorzero#3685) Co-authored-by: Gabriel Bianconi <GabrielBianconi@users.noreply.github.com>
bumped version to 2025.9.5 (tensorzero#3679) * bumped version to 2025.9.5 * fixed issue with openrouter --------- Co-authored-by: Gabriel Bianconi <1275491+GabrielBianconi@users.noreply.github.com>
PreviousNext