Edit check/Tone Check/Model evaluation

This page provides coordination details for the community evaluation of the model used for Tone Check (formerly known as Peacock Check).

Each evaluator will review a minimum of 30 diffs per language using Annotool. Each evaluator has access to 100 diffs per language, and evaluators are encouraged to review as many diffs as possible.

October 2025 evaluation

The second round of evaluations includes languages listed below. This test will start on October 3, and last for one week. You will be asked to review the diffs before October 10, end of day.

If needed, one of the languages listed below may be moved to the next round of evaluation.

Please add your name to the list to be contacted for a test. We are looking for 5 users minimum for each language, and the more evaluators we have the better.

Please do not add other languages.

Edits shown to be evaluated may show some other type of issues (spam, bad formatting, bad linking...). If the issue identified is not a tone issue, then the edit should be marked as "Leave as it is", as it is another type of issue. Please only focus on the tone of the edit.

Arabic

Evaluation done

Czech

Evaluation done

German

Evaluation paused

Hebrew

Evaluation done

Indonesian

Evaluation done

Italian

Evaluation done

Epìdosis

Dutch

Evaluation paused

Effeietsanders (talk) 11:47, 2 October 2025 (UTC) (only when the dataset becomes more relevant - see talkpage.)

Polish

Evaluation paused

Russian

Evaluation done

Turkish

Evaluation done

Chinese

Evaluation done

Farsi

Evaluation done

Norwegian

Evaluation done

Romanian

Evaluation done

Strainu

Latvian

Evaluation paused

May 2025 evaluation

The first round of evaluations includes following languages: English, Spanish, Japanese, Portuguese, and French. This test will start on May 23 and last for one week. You will be asked to review the diffs before May 30, end of day.

If needed, one of the languages listed above may be moved to the next round of evaluation.

Please add your name to the list to be contacted for a test. We are looking for 5 users minimum for each language, and the more evaluators we have the better.

Please do not add other languages.

The results of the May 2025 evaluation have been published.

English

Evaluation done.

PMG (talk) 20:10, 15 May 2025 (UTC)
Chipmunkdavis (talk) 03:53, 16 May 2025 (UTC)
Sdkb ^talk 17:25, 16 May 2025 (UTC)
NightWolf1223 (talk) 23:27, 19 May 2025 (UTC)
Parksfan1955 (talk) 03:02, 20 May 2025 (UTC)
The Grid (talk) 03:47, 20 May 2025 (UTC)
~/Bunnypranav:<ping> 04:56, 20 May 2025 (UTC)
Sophisticatedevening🍷^(talk) 16:09, 20 May 2025 (UTC)
--Xandru4 (talk) 16:08, 22 May 2025 (UTC)
Meritkosy (talk) 05:24, 23 May 2025 (UTC)
Fuzheado (talk) 11:49, 23 May 2025 (UTC)
Dotruonggiahy12 (talk) 18:50, 26 May 2025 (UTC)

Spanish

Evaluation done.

Soylacarli (talk) 16:02, 19 May 2025 (UTC)
Elwinlhq (talk) 16:13, 19 May 2025 (UTC)
Omar sansi (talk) 05:30, 22 May 2025 (UTC)
Pintakuda (talk) 15:16, 23 May 2025 (UTC)
Hard --Hard (talk) 13:53, 22 May 2025 (UTC)
Felino Volador (talk) 15:28, 22 May 2025 (UTC)
DidiCoronel (talk) 17:40, 22 May 2025 (UTC)
Silva Selva (talk) 03:14, 23 May 2025 (UTC)

Japanese

Evaluation done.

Afaz (talk) 01:06, 20 May 2025 (UTC)
Hexirp (talk) 15:27, 21 May 2025 (UTC)
VZP10224 (talk) 06:33, 24 May 2025 (UTC)
さえぼー (talk) 06:40, 24 May 2025 (UTC)
Wadakuramon (talk) 07:05, 24 May 2025 (UTC)
...

Portuguese

Evaluate the model in Portuguese

Arthur Timm (talk) 07:51, 17 May 2025 (UTC)
Parzeus (talk) 16:05, 19 May 2025 (UTC)
Vazafirst (talk) 19:05, 22 May 2025 (UTC)
CorraleH (talk) 17:59, 23 June 2025 (UTC)
...

French

On the contrary of what was announced on May 19, the French language model is ready for the first round.

Evaluation done.

Goombiis (talk) 10:05, 20 May 2025 (UTC)
--Pa2chant.bis (talk) 16:50, 20 May 2025 (UTC)
Frenouille (talk) 07:10, 21 May 2025 (UTC)
PCWanonyme (talk) 09:40, 21 May 2025 (UTC)
LD (talk) 13:07, 22 May 2025 (UTC)
--Omnilaika02 (talk) 13:58, 22 May 2025 (UTC)
…

Arabic

On the contrary of what was announced on May 19, the Arabic language model is not yet ready for the first round. We are sorry about this last minute change. Araic language is part of the October 2025 evaluation.

Volunteer for future evaluations

If you would like to participate in a future round of model evaluation for a language hasn't been listed above, please add your username below along with the language(s) you can provide support for.