Jump to content

Edit check/Tone Check/Model evaluation

From mediawiki.org

This page provides coordination details for the community evaluation of the model used for Tone Check (formerly known as Peacock Check).

Each evaluator will review a minimum of 30 diffs per language using Annotool. Each evaluator has access to 100 diffs per language, and evaluators are encouraged to review as many diffs as possible.

October 2025 evaluation

[edit]

The second round of evaluations includes languages listed below. This test will start on October 3, and last for one week. You will be asked to review the diffs before October 10, end of day.

If needed, one of the languages listed below may be moved to the next round of evaluation.

Please add your name to the list to be contacted for a test. We are looking for 5 users minimum for each language, and the more evaluators we have the better.

Please do not add other languages.

Edits shown to be evaluated may show some other type of issues (spam, bad formatting, bad linking...). If the issue identified is not a tone issue, then the edit should be marked as "Leave as it is", as it is another type of issue. Please only focus on the tone of the edit.

Arabic

[edit]

Evaluation done

Czech

[edit]

Evaluation done

German

[edit]

Evaluation paused

Hebrew

[edit]

Evaluation done

Indonesian

[edit]

Evaluation done

Italian

[edit]

Evaluation done

Dutch

[edit]

Evaluation paused

  • Effeietsanders (talk) 11:47, 2 October 2025 (UTC) (only when the dataset becomes more relevant - see talkpage.)

Polish

[edit]

Evaluation paused

Russian

[edit]

Evaluation done

Turkish

[edit]

Evaluation done

Chinese

[edit]

Evaluation done

Farsi

[edit]

Evaluation done

Norwegian

[edit]

Evaluation done

Romanian

[edit]

Evaluation done

Latvian

[edit]

Evaluation paused

May 2025 evaluation

[edit]

The first round of evaluations includes following languages: English, Spanish, Japanese, Portuguese, and French. This test will start on May 23 and last for one week. You will be asked to review the diffs before May 30, end of day.

If needed, one of the languages listed above may be moved to the next round of evaluation.

Please add your name to the list to be contacted for a test. We are looking for 5 users minimum for each language, and the more evaluators we have the better.

Please do not add other languages.

The results of the May 2025 evaluation have been published.

English

[edit]

Evaluation done.

  1. PMG (talk) 20:10, 15 May 2025 (UTC)
  2. Chipmunkdavis (talk) 03:53, 16 May 2025 (UTC)
  3. Sdkbtalk 17:25, 16 May 2025 (UTC)
  4. NightWolf1223 (talk) 23:27, 19 May 2025 (UTC)
  5. Parksfan1955 (talk) 03:02, 20 May 2025 (UTC)
  6. The Grid (talk) 03:47, 20 May 2025 (UTC)
  7. ~/Bunnypranav:<ping> 04:56, 20 May 2025 (UTC)
  8. Sophisticatedevening🍷(talk) 16:09, 20 May 2025 (UTC)
  9. --Xandru4 (talk) 16:08, 22 May 2025 (UTC)
  10. Meritkosy (talk) 05:24, 23 May 2025 (UTC)
  11. Fuzheado (talk) 11:49, 23 May 2025 (UTC)
  12. Dotruonggiahy12 (talk) 18:50, 26 May 2025 (UTC)

Spanish

[edit]

Evaluation done.

  1. Soylacarli (talk) 16:02, 19 May 2025 (UTC)
  2. Elwinlhq (talk) 16:13, 19 May 2025 (UTC)
  3. Omar sansi (talk) 05:30, 22 May 2025 (UTC)
  4. Pintakuda (talk) 15:16, 23 May 2025 (UTC)
  5. Hard --Hard (talk) 13:53, 22 May 2025 (UTC)
  6. Felino Volador (talk) 15:28, 22 May 2025 (UTC)
  7. DidiCoronel (talk) 17:40, 22 May 2025 (UTC)
  8. Silva Selva (talk) 03:14, 23 May 2025 (UTC)

Japanese

[edit]

Evaluation done.

  1. Afaz (talk) 01:06, 20 May 2025 (UTC)
  2. Hexirp (talk) 15:27, 21 May 2025 (UTC)
  3. VZP10224 (talk) 06:33, 24 May 2025 (UTC)
  4. さえぼー (talk) 06:40, 24 May 2025 (UTC)
  5. Wadakuramon (talk) 07:05, 24 May 2025 (UTC)
  6. ...

Portuguese

[edit]

Evaluate the model in Portuguese

  1. Arthur Timm (talk) 07:51, 17 May 2025 (UTC)
  2. Parzeus (talk) 16:05, 19 May 2025 (UTC)
  3. Vazafirst (talk) 19:05, 22 May 2025 (UTC)
  4. CorraleH (talk) 17:59, 23 June 2025 (UTC)
  5. ...

French

[edit]

On the contrary of what was announced on May 19, the French language model is ready for the first round.

Evaluation done.

  1. Goombiis (talk) 10:05, 20 May 2025 (UTC)
  2. --Pa2chant.bis (talk) 16:50, 20 May 2025 (UTC)
  3. Frenouille (talk) 07:10, 21 May 2025 (UTC)
  4. PCWanonyme (talk) 09:40, 21 May 2025 (UTC)
  5. LD (talk) 13:07, 22 May 2025 (UTC)
  6. --Omnilaika02 (talk) 13:58, 22 May 2025 (UTC)

Arabic

[edit]

On the contrary of what was announced on May 19, the Arabic language model is not yet ready for the first round. We are sorry about this last minute change. Araic language is part of the October 2025 evaluation.

Volunteer for future evaluations

[edit]

If you would like to participate in a future round of model evaluation for a language hasn't been listed above, please add your username below along with the language(s) you can provide support for.