Skip to content

Tags: sierra-research/tau2-bench

Tags

v0.2.0

Toggle v0.2.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat: Add comprehensive changelog and automated release management sy…

…stem (#58)

* chore: prepare release v0.2.0

- Web-based leaderboard system now live at tau-bench.com
- Interactive submission management and model comparison
- Mobile-responsive design with trajectory visualization
- Enhanced deployment pipeline and asset management

* docs: update VERSIONING.md to reflect automated release setup

- Updated to match actual Release Please v4 workflow
- Clarified automated vs manual release processes
- Fixed release cadence based on project history
- Added current GitHub Actions configuration
- Updated development version to 0.2.1-dev after v0.2.0 release

* feat: add automated release workflow and comprehensive documentation

- Add Release Please v4 workflow for automated changelog generation
- Include release templates and checklists for manual releases
- Add comprehensive automation guide with conventional commit examples
- Setup complete release management infrastructure

* feat: highlight live leaderboard in README

- Add prominent leaderboard badge in header section
- Add 'What's New' section featuring v0.2.0 leaderboard launch
- Include direct links to tau-bench.com for easy access
- Highlight key features: interactive rankings, mobile support, trajectory analysis

v0.1.3

Toggle v0.1.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fixes llm args + remove default NL assertions checks (#23)

* update README, update type in fig, add num tasks cli

* Made pip install -e the default. For non editable install, added option to set a TAU_DATA_DIR to point to the data. Added a fall back to local source if this is not set. Added tau2 check-data cli for people to check data install
fixed num-tasks flag. Fix display of tasks name in cli.

* Fix CLI parser for dict args

There is no way for `dict` to parse a CLI string into a dictionary, so `type=dict` is simply non-functional. This change fixes that by allowing users to pass JSON strings at the CLI to configure LLMs. 

I am using this to pass `api_key` and `api_base` for self-hosted LLMs on an OpenAI API-like endpoint.

* Fix brace escaping

* updated evaluator so that nl assertions are not run by default

---------

Co-authored-by: Honghua Dong <dhh19951@gmail.com>
Co-authored-by: Alexander Conway <alex-dr@users.noreply.github.com>

v0.1.2

Toggle v0.1.2's commit message
Made pip install -e the default. For non editable install, added opti…

…on to set a TAU_DATA_DIR to point to the data. Added a fall back to local source if this is not set. Added tau2 check-data cli for people to check data install

fixed num-tasks flag. Fix display of tasks name in cli.

v0.1.1

Toggle v0.1.1's commit message
cli fixup

v0.1.0

Toggle v0.1.0's commit message
initial release