datasets

WP-Bench Datasets

This directory contains the benchmark test suites and tooling for publishing to Hugging Face Hub.

Structure

datasets/
├── suites/                    # Source of truth (human-editable JSON)
│   └── wp-core-v1/
│       ├── execution/         # Code generation tests (one file per category)
│       │   ├── hooks.json
│       │   ├── rest-api.json
│       │   └── ...
│       └── knowledge/         # Multiple choice / short answer tests
│           ├── hooks.json
│           ├── rest-api.json
│           └── ...
├── data/                      # Generated Parquet for HF (gitignored)
│   └── test.parquet
├── export_dataset.py          # Converts suites → Parquet
└── README.md

Local Development

The harness loads directly from suites/ JSON files:

# wp-bench.yaml
dataset:
  source: local
  name: wp-core-v1

Publishing to Hugging Face

Export to Parquet:
```
python datasets/export_dataset.py
```

Upload to HF Hub:

huggingface-cli upload WordPress/wp-bench-v1 datasets/data/

Users can then load:

from datasets import load_dataset
ds = load_dataset("WordPress/wp-bench-v1", split="test")

Adding New Suites

Create suites/<suite-name>/execution/ and knowledge/ directories
Add category JSON files (e.g., hooks.json, rest-api.json) to each directory
Follow the schema in existing suites
Run python datasets/export_dataset.py to include in Parquet export

Schema

Execution Tests

Field	Type	Description
`id`	string	Unique test ID
`prompt`	string	Task description for the model
`requirements`	array	List of requirements the solution must meet
`static_checks`	object	Regex patterns to check in generated code
`runtime_checks`	object	Assertions to run in WordPress environment
`reference_solution`	string	Example correct solution

Knowledge Tests

Field	Type	Description
`id`	string	Unique test ID
`prompt`	string	Question text
`choices`	array	Multiple choice options `[{key, text}]`
`correct_answer`	string	Correct choice key (e.g., "B")

Name		Name	Last commit message	Last commit date
parent directory ..
suites/wp-core-v1		suites/wp-core-v1
README.md		README.md
export_dataset.py		export_dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

WP-Bench Datasets

Structure

Local Development

Publishing to Hugging Face

Adding New Suites

Schema

Execution Tests

Knowledge Tests

FilesExpand file tree

datasets

Directory actions

More options

Directory actions

More options

Latest commit

History

datasets

Folders and files

parent directory

README.md

WP-Bench Datasets

Structure

Local Development

Publishing to Hugging Face

Adding New Suites

Schema

Execution Tests

Knowledge Tests