Tags: sierra-research/tau2-bench
Tags
feat: Add comprehensive changelog and automated release management sy… …stem (#58) * chore: prepare release v0.2.0 - Web-based leaderboard system now live at tau-bench.com - Interactive submission management and model comparison - Mobile-responsive design with trajectory visualization - Enhanced deployment pipeline and asset management * docs: update VERSIONING.md to reflect automated release setup - Updated to match actual Release Please v4 workflow - Clarified automated vs manual release processes - Fixed release cadence based on project history - Added current GitHub Actions configuration - Updated development version to 0.2.1-dev after v0.2.0 release * feat: add automated release workflow and comprehensive documentation - Add Release Please v4 workflow for automated changelog generation - Include release templates and checklists for manual releases - Add comprehensive automation guide with conventional commit examples - Setup complete release management infrastructure * feat: highlight live leaderboard in README - Add prominent leaderboard badge in header section - Add 'What's New' section featuring v0.2.0 leaderboard launch - Include direct links to tau-bench.com for easy access - Highlight key features: interactive rankings, mobile support, trajectory analysis
Fixes llm args + remove default NL assertions checks (#23) * update README, update type in fig, add num tasks cli * Made pip install -e the default. For non editable install, added option to set a TAU_DATA_DIR to point to the data. Added a fall back to local source if this is not set. Added tau2 check-data cli for people to check data install fixed num-tasks flag. Fix display of tasks name in cli. * Fix CLI parser for dict args There is no way for `dict` to parse a CLI string into a dictionary, so `type=dict` is simply non-functional. This change fixes that by allowing users to pass JSON strings at the CLI to configure LLMs. I am using this to pass `api_key` and `api_base` for self-hosted LLMs on an OpenAI API-like endpoint. * Fix brace escaping * updated evaluator so that nl assertions are not run by default --------- Co-authored-by: Honghua Dong <dhh19951@gmail.com> Co-authored-by: Alexander Conway <alex-dr@users.noreply.github.com>
Made pip install -e the default. For non editable install, added opti… …on to set a TAU_DATA_DIR to point to the data. Added a fall back to local source if this is not set. Added tau2 check-data cli for people to check data install fixed num-tasks flag. Fix display of tasks name in cli.