Tags · sierra-research/tau2-bench

v0.2.0

feat: Add comprehensive changelog and automated release management sy…

…stem (#58)

* chore: prepare release v0.2.0

- Web-based leaderboard system now live at tau-bench.com
- Interactive submission management and model comparison
- Mobile-responsive design with trajectory visualization
- Enhanced deployment pipeline and asset management

* docs: update VERSIONING.md to reflect automated release setup

- Updated to match actual Release Please v4 workflow
- Clarified automated vs manual release processes
- Fixed release cadence based on project history
- Added current GitHub Actions configuration
- Updated development version to 0.2.1-dev after v0.2.0 release

* feat: add automated release workflow and comprehensive documentation

- Add Release Please v4 workflow for automated changelog generation
- Include release templates and checklists for manual releases
- Add comprehensive automation guide with conventional commit examples
- Setup complete release management infrastructure

* feat: highlight live leaderboard in README

- Add prominent leaderboard badge in header section
- Add 'What's New' section featuring v0.2.0 leaderboard launch
- Include direct links to tau-bench.com for easy access
- Highlight key features: interactive rankings, mobile support, trajectory analysis

Oct 6, 2025
f8de30c
zip
tar.gz
Notes

v0.1.3

Fixes llm args + remove default NL assertions checks (#23)

* update README, update type in fig, add num tasks cli

* Made pip install -e the default. For non editable install, added option to set a TAU_DATA_DIR to point to the data. Added a fall back to local source if this is not set. Added tau2 check-data cli for people to check data install
fixed num-tasks flag. Fix display of tasks name in cli.

* Fix CLI parser for dict args

There is no way for `dict` to parse a CLI string into a dictionary, so `type=dict` is simply non-functional. This change fixes that by allowing users to pass JSON strings at the CLI to configure LLMs. 

I am using this to pass `api_key` and `api_base` for self-hosted LLMs on an OpenAI API-like endpoint.

* Fix brace escaping

* updated evaluator so that nl assertions are not run by default

---------

Co-authored-by: Honghua Dong <dhh19951@gmail.com>
Co-authored-by: Alexander Conway <alex-dr@users.noreply.github.com>

Aug 26, 2025
5ba9e3e
zip
tar.gz

v0.1.2

Made pip install -e the default. For non editable install, added opti…

…on to set a TAU_DATA_DIR to point to the data. Added a fall back to local source if this is not set. Added tau2 check-data cli for people to check data install

fixed num-tasks flag. Fix display of tasks name in cli.

Jul 17, 2025
40f46d3
zip
tar.gz

v0.1.1

cli fixup

Jun 12, 2025
2692867
zip
tar.gz

v0.1.0

initial release

Jun 12, 2025
37199f3
zip
tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0

v0.1.3

v0.1.2

v0.1.1

v0.1.0

Tags: sierra-research/tau2-bench

v0.2.0

v0.1.3

v0.1.2

v0.1.1

v0.1.0