evals

Gemini CLI Workflow Evaluations

This directory contains resources for evaluating and improving the example workflows using a TypeScript + Vitest framework.

Systematic Testing: Ensure changes to prompts or configurations improve quality.
Regression Testing: Catch degradations in performance.
Benchmarking: Compare different models (e.g., gemini-2.5-pro vs gemini-2.5-flash).

npm run test:evals

To run against a specific model:

GEMINI_MODEL=gemini-2.5-flash npm run test:evals

Create a new file in evals/ ending in .eval.ts.
Add corresponding test data in evals/data/.
Use the TestRig to set up files, environment variables, and run the CLI.
Assert the expected behavior (e.g., check GITHUB_ENV output or tool calls captured in telemetry).

Name		Name	Last commit message	Last commit date
parent directory ..
data		data
README.md		README.md
gemini-assistant.eval.ts		gemini-assistant.eval.ts
gemini-plan-execute.eval.ts		gemini-plan-execute.eval.ts
gemini-scheduled-triage.eval.ts		gemini-scheduled-triage.eval.ts
issue-fixer.eval.ts		issue-fixer.eval.ts
issue-triage.eval.ts		issue-triage.eval.ts
mock-mcp-server.ts		mock-mcp-server.ts
pr-review.eval.ts		pr-review.eval.ts
test-rig.ts		test-rig.ts
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts