tutorial

Tutorial: TensorZero Evaluations

This directory contains the code for the TensorZero Evaluations Guide.

Getting Started

TensorZero

We provide a configuration file (./config/tensorzero.toml) that specifies:

A write_haiku function that generates a haiku, with gpt_4o and gpt_4o_mini variants.
Evaluators for the write_haiku function, including exact match and assorted LLM judges.

Prerequisites

Install Docker.
Install Python 3.10+.
Install the Python dependencies. We recommend using uv: uv sync
Generate an API key for OpenAI (OPENAI_API_KEY).

Setup

Create a .env file with the OPENAI_API_KEY environment variable (see .env.example for an example).
Run docker compose up to launch the TensorZero Gateway, the TensorZero UI, and a development ClickHouse database.
Run the main.py script to generate 100 haikus.

Evaluations

Create a Dataset

Let's generate a dataset composed of our 100 haikus.

Open the UI, navigate to "Datasets", and select "Build Dataset" (http://localhost:4000/datasets/builder).
Create a new dataset called haiku_dataset. Select your write_haiku function, "None" as the metric, and "Inference" as the dataset output.

Run an Evaluation — CLI

Let's evaluate our gpt_4o variant using the TensorZero Evaluations CLI tool.

Launch an evaluation with the CLI:

docker compose run --rm evaluations \
    --function-name write_haiku \
    --evaluator-names valid_haiku,metaphor_count,exact_match,compare_haikus \
    --dataset-name haiku_dataset \
    --variant-name gpt_4o \
    --concurrency 5

Evaluate a Dataset — UI

Let's evaluate our gpt_4o_mini variant using the TensorZero Evaluations UI, and compare the results.

Navigate to "Evaluations" (http://localhost:4000/evaluations) and select "New Run".
Launch an evaluation with the gpt_4o_mini variant.
Select the previous evaluation run in the dropdown to compare the results.

Name		Name	Last commit message	Last commit date
parent directory ..
config		config
data		data
.python-version		.python-version
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Tutorial: TensorZero Evaluations

Getting Started

TensorZero

Prerequisites

Setup

Evaluations

Create a Dataset

Run an Evaluation — CLI

Evaluate a Dataset — UI

FilesExpand file tree

tutorial

Directory actions

More options

Directory actions

More options

Latest commit

History

tutorial

Folders and files

parent directory

README.md

Tutorial: TensorZero Evaluations

Getting Started

TensorZero

Prerequisites

Setup

Evaluations

Create a Dataset

Run an Evaluation — CLI

Evaluate a Dataset — UI