Skip to content

Adds many more knowledge tests#20

Merged
JasonTheAdams merged 4 commits intotrunkfrom
add/more-knowledge-tests
Feb 24, 2026
Merged

Adds many more knowledge tests#20
JasonTheAdams merged 4 commits intotrunkfrom
add/more-knowledge-tests

Conversation

@JasonTheAdams
Copy link
Copy Markdown
Member

This greatly increases the size of the knowledge-based datasets. I had the AI go through each existing test category and scour my local copy of wordpress-develop (including Gutenberg) to add additional questions of varying difficulty.

I tested this against Sonnet 4.5 and Opus 4.6. Not surprising, as I increased the questions their scores when down. The final results are:

Model Knowledge Score
Sonnet 4.5 58.1%
Opus 4.6 63.7%

@JasonTheAdams JasonTheAdams self-assigned this Feb 24, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 24, 2026

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: JasonTheAdams <jason_the_adams@git.wordpress.org>
Co-authored-by: Jameswlepage <isotropic@git.wordpress.org>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

Copy link
Copy Markdown
Contributor

@Jameswlepage Jameswlepage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runs well. I'm getting similar results, and from my audit of the set, it's accurate and challenging for the model.

@JasonTheAdams JasonTheAdams merged commit 92052c4 into trunk Feb 24, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants