This directory contains resources for evaluating and improving the example workflows using a TypeScript + Vitest framework.
- Systematic Testing: Ensure changes to prompts or configurations improve quality.
- Regression Testing: Catch degradations in performance.
- Benchmarking: Compare different models (e.g.,
gemini-2.5-provsgemini-2.5-flash).
evals/:test-rig.ts: Utility to setup a temporary environment for the CLI.issue-triage.eval.ts: Benchmark for the Issue Triage workflow.pr-review.eval.ts: Benchmark for the PR Review workflow.issue-fixer.eval.ts: Benchmark for the autonomous Issue Fixer.gemini-assistant.eval.ts: Benchmark for the interactive Assistant.gemini-scheduled-triage.eval.ts: Benchmark for batch triage.data/*.jsonl: Gold-standard datasets for each workflow.vitest.config.ts: Configuration for the evaluation runner.
npm installgemini-cliinstalled and available in your PATH.GEMINI_API_KEYenvironment variable set.
npm run test:evalsTo run against a specific model:
GEMINI_MODEL=gemini-2.5-flash npm run test:evals- Create a new file in
evals/ending in.eval.ts. - Add corresponding test data in
evals/data/. - Use the
TestRigto set up files, environment variables, and run the CLI. - Assert the expected behavior (e.g., check
GITHUB_ENVoutput or tool calls captured in telemetry).