Hey all -- I'm closing out this task. Based on the discussion above, this prototype is complete but we appear to be stalled as far as follow-up work. I have quickly drafted a model card to capture where we're at in a slightly more accessible/discoverable manner: https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Wikidata_item_completeness.

Dec 20 2024, 12:35 AM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Nov 21 2024

Mayakp.wiki moved T372292: Evaluate Wikidata Item Quality Model with Users/Editors from Incoming to Watching on the Movement-Insights board.

sounds good. thanks Miriam !

Nov 21 2024, 7:44 PM · Research-Freezer, Movement-Insights, Linked-Open-Data-Network-Program, Wikidata

Nov 20 2024

Miriam edited projects for T372292: Evaluate Wikidata Item Quality Model with Users/Editors, added: Research-Freezer; removed Research.

Nov 20 2024, 1:57 PM · Research-Freezer, Movement-Insights, Linked-Open-Data-Network-Program, Wikidata

Miriam added a comment to T372292: Evaluate Wikidata Item Quality Model with Users/Editors.

@Mayakp.wiki no action items for you right now. We are going to revise our timelines and priorities for this work in a few months time.

Nov 20 2024, 1:57 PM · Research-Freezer, Movement-Insights, Linked-Open-Data-Network-Program, Wikidata

Oct 25 2024

Mayakp.wiki added a comment to T372292: Evaluate Wikidata Item Quality Model with Users/Editors.

@Miriam , can you pls let us know if there is an action item for Movement-Insights ?

Oct 25 2024, 10:22 PM · Research-Freezer, Movement-Insights, Linked-Open-Data-Network-Program, Wikidata

Aug 19 2024

• XiaoXiao-WMF added a comment to T321224: Wikidata Item Quality Model.

In T321224#10072541, @Lydia_Pintscher wrote:

In T321224#10054680, @XiaoXiao-WMF wrote:

from product side:

product adoption part is not clear @Lydia_Pintscher please clarify

From my side this would still be very useful to have, especially for tracking how the content evolves over time at scale. It is however less important than the revert risk model for Wikidata.

Aug 19 2024, 2:34 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Lydia_Pintscher added a comment to T321224: Wikidata Item Quality Model.

In T321224#10054680, @XiaoXiao-WMF wrote:

from product side:

product adoption part is not clear @Lydia_Pintscher please clarify

Aug 19 2024, 9:38 AM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Aug 12 2024

Miriam added a project to T321224: Wikidata Item Quality Model: Essential-Work.

Aug 12 2024, 5:01 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Miriam created T372292: Evaluate Wikidata Item Quality Model with Users/Editors.

Aug 12 2024, 3:23 PM · Research-Freezer, Movement-Insights, Linked-Open-Data-Network-Program, Wikidata

Aug 9 2024

• XiaoXiao-WMF updated subscribers of T321224: Wikidata Item Quality Model.

summary from discussion with Isaac:

we have a simple prototype done
this prototype needs fairly large amount of pre-computed properties to work effectively in production
without feature storage component readily being available today, there is no pathway for sustainable maintenance in production

the points above are mostly infrastructure constrains
@isarantopoulos fyi

Aug 9 2024, 5:53 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Jul 3 2024

Lydia_Pintscher added a comment to T321224: Wikidata Item Quality Model.

From my side it'd still be great to move this forward and have a better Item Quality model in Liftwing. If there is anything I can do to help please let me know.

Jul 3 2024, 7:15 AM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Jun 26 2024

Mayakp.wiki added a comment to T321224: Wikidata Item Quality Model.

thanks @Isaac ! I will keep an eye on this task.
Pls reach out to Movement-Insights team if you need help or support with use cases, or to chime in on prioritization.

Jun 26 2024, 5:18 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Jun 24 2024

Isaac added a comment to T321224: Wikidata Item Quality Model.

just double checking - what is the status of this? Should we close this / move to freezer? Any update we can add here?

@Miriam thanks for checking - this seems to be a victim of my sabbatical last year. Summary of where we are at:

I was feeling pretty good about where the model was and had an API (example) and bulk cluster job (code) ready.
The bulk analysis raised one issue that I wanted to address - how to handle items that are subclasses but that do not have instance-of properties. I added some logic to create an expectation for any item that has a subclass property but I don't think it's great so I'd want to continue to iterate on that. That said, it affects a very small proportion of items.
We wanted to do an evaluation of Wikidata editors to see if this model does a better job of meeting their expectations than the approach taken by the original ORES itemquality model.

Jun 24 2024, 9:21 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Jun 18 2024

Mayakp.wiki moved T321224: Wikidata Item Quality Model from Incoming to Watching on the Movement-Insights board.

Jun 18 2024, 7:30 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Mayakp.wiki added a project to T321224: Wikidata Item Quality Model: Movement-Insights.

Jun 18 2024, 7:30 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Mayakp.wiki added a comment to T321224: Wikidata Item Quality Model.

will this have an impact, or help improve our existing way of measuring content gaps? if yes, then I would be happy to weigh in on some questions - like providing use-cases associated with this model.

Jun 18 2024, 7:29 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Miriam added a comment to T321224: Wikidata Item Quality Model.

@Isaac just double checking - what is the status of this? Should we close this / move to freezer? Any update we can add here?

Jun 18 2024, 1:40 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

May 3 2024

Lydia_Pintscher closed T307319: Design System Codex / Wikit as Invalid.

May 3 2024, 1:27 PM · Linked-Open-Data-Network-Program, Wikidata, Foundational Technology Requests

Lydia_Pintscher closed T307323: WMDE Machine Learning (ORES) as Invalid.

May 3 2024, 1:26 PM · Linked-Open-Data-Network-Program, Wikidata, Foundational Technology Requests

Lydia_Pintscher closed T307321: WDQS as Invalid.

May 3 2024, 1:24 PM · Linked-Open-Data-Network-Program, Wikidata, Foundational Technology Requests

Apr 29 2024

Aklapper placed T307323: WMDE Machine Learning (ORES) up for grabs.

Removing inactive task assignee. (Please do so as part of offboarding - thanks.)

Apr 29 2024, 7:36 AM · Linked-Open-Data-Network-Program, Wikidata, Foundational Technology Requests

Aklapper placed T307321: WDQS up for grabs.

Removing inactive task assignee. (Please do so as part of offboarding - thanks.)

Apr 29 2024, 7:33 AM · Linked-Open-Data-Network-Program, Wikidata, Foundational Technology Requests

Aklapper placed T307319: Design System Codex / Wikit up for grabs.

Removing inactive task assignee. (Please do so as part of offboarding - thanks.)

Apr 29 2024, 7:32 AM · Linked-Open-Data-Network-Program, Wikidata, Foundational Technology Requests

Oct 7 2023

Karima added a project to T348371: Develop the PHP Data Objects (PDO) extension for SPARQL: MediaWiki-extensions-LinkedWiki.

Oct 7 2023, 8:48 AM · MediaWiki-extensions-LinkedWiki, Linked-Open-Data-Network-Program

Karima created T348371: Develop the PHP Data Objects (PDO) extension for SPARQL.

Oct 7 2023, 8:38 AM · MediaWiki-extensions-LinkedWiki, Linked-Open-Data-Network-Program

Aug 3 2023

Isaac added a comment to T321224: Wikidata Item Quality Model.

I'm going to be out the next several weeks so FYI likely won't hear updates until mid-September on this. Thanks for these additional details though!

Aug 3 2023, 7:58 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Jul 28 2023

Lydia_Pintscher added a comment to T321224: Wikidata Item Quality Model.

Thanks for this!
So in general it is pretty important for Items to be classified and put into the right place in the larger ontology. So these statements do imho deserve some sort of special status as they are generally more important than other statements.
Now there are several Properties that can represent such relations. The main ones we should probably focus on are instance of, subclass of and part of as explained on https://www.wikidata.org/wiki/Help:Basic_membership_properties.

Jul 28 2023, 10:28 AM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Jul 25 2023

Isaac added a comment to T321224: Wikidata Item Quality Model.

That's quite an interesting table! Would it be possible to get the actual Item IDs for the last two rows? It could be instructive to know which Items the model thinks are very incomplete but have excellent quality :)

@Michael thanks for the questions! Some context: I think the completeness model is better suited for evaluating items (it's much more nuanced than the quality model, which largely just takes into consideration the number of statements an item has). This analysis hopefully will do two things: 1) help us find some places where the completeness model doesn't do great and we could tweak it, and, 2) build a sample of items to give to Wikidata experts to ensure that the completeness model is in fact capturing their expectations better than the quality model.

Jul 25 2023, 1:36 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Jul 24 2023

Michael added a comment to T321224: Wikidata Item Quality Model.

In T321224#9035684, @Isaac wrote:
Oooh and the job worked! High-level data on overlap between the two scores where they are the same except completeness just takes into account how many of the expected claims/refs/labels are there and quality adds the total number of claims to the features too:
+------------------+-------------+---------+
|completeness_label|quality_label|num_items|
+------------------+-------------+---------+
|D                 |D            |29955491 |
[..]
|E                 |A            |241      |
|A                 |E            |6        |
+------------------+-------------+---------+

Jul 24 2023, 7:11 AM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Jul 21 2023

Isaac added a comment to T321224: Wikidata Item Quality Model.

Oooh and the job worked! High-level data on overlap between the two scores where they are the same except completeness just takes into account how many of the expected claims/refs/labels are there and quality adds the total number of claims to the features too:

Jul 21 2023, 10:36 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Isaac added a comment to T321224: Wikidata Item Quality Model.

Updates:

Finally ported all the code from the API to work on the cluster. I don't know if it'll run to completeness yet but I ran it on a subset and the results largely matched the API: https://gitlab.wikimedia.org/isaacj/miscellaneous-wikimedia/-/blob/master/annotation-gap/wikidata-completeness.ipynb
- Notably, I got rid of the statsmodel ordinal logistic regression dependency which was painful and just take the parameters/thresholds from the model and do the math myself.
Next step will be running this fully or on a sample of data and then choosing a sample of items to provide to raters to compare the scores and choose whether the quality or completeness models seems to best capture the concept of "this Wikidata item is in good shape".

Jul 21 2023, 10:24 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Jul 12 2023

Aklapper changed the edit policy for Linked-Open-Data-Network-Program.

Jul 12 2023, 11:02 AM

Jun 30 2023

Isaac added a comment to T321224: Wikidata Item Quality Model.

Updates:

Wrestling with re-adapting everything to the cluster but making good progress. One of the main challenges is that the wikidata item schema is different between cluster and API so lots of little errors that I'm having to discover and correct as I make that adaptation in source data.

Jun 30 2023, 9:02 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Jun 23 2023

Isaac added a comment to T321224: Wikidata Item Quality Model.

Updates:

Successfully generated the property data I need so now I have the necessary data to run the model in bulk on the cluster and can turn towards generating a dataset for sampling. Notebook: https://gitlab.wikimedia.org/isaacj/miscellaneous-wikimedia/-/blob/master/annotation-gap/generate_wikidata_propertyfreq_data.ipynb (which is the complement to the one I use for generating reference frequency data: https://gitlab.wikimedia.org/isaacj/miscellaneous-wikimedia/-/blob/master/annotation-gap/generate_wikidata_referencefreq_data.ipynb)

Jun 23 2023, 6:38 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Jun 20 2023

• AnneT closed T309239: Add Field component to Codex, a subtask of T307319: Design System Codex / Wikit, as Resolved.

Jun 20 2023, 10:04 PM · Linked-Open-Data-Network-Program, Wikidata, Foundational Technology Requests

• AnneT closed T309246: Label: Add Label component to Codex, a subtask of T307319: Design System Codex / Wikit, as Resolved.

Jun 20 2023, 10:04 PM · Linked-Open-Data-Network-Program, Wikidata, Foundational Technology Requests

Jun 16 2023

Isaac added a comment to T321224: Wikidata Item Quality Model.

Updates:

Began process of regenerating property-frequency table on cluster given that we shouldn't depend on RECOIN for bulk computation even if it greatly simplifies the API prototype. Working out a few bugs but feel like I have the right approach and relatively simple.

Jun 16 2023, 8:55 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Jun 6 2023

CCiufo-WMF closed T309248: [EPIC] Add Link component to Codex, a subtask of T307319: Design System Codex / Wikit, as Resolved.

Jun 6 2023, 10:07 PM · Linked-Open-Data-Network-Program, Wikidata, Foundational Technology Requests

May 22 2023

• AnneT changed the status of T309246: Label: Add Label component to Codex, a subtask of T307319: Design System Codex / Wikit, from Open to In Progress.

May 22 2023, 5:57 PM · Linked-Open-Data-Network-Program, Wikidata, Foundational Technology Requests

• AnneT changed the status of T309239: Add Field component to Codex, a subtask of T307319: Design System Codex / Wikit, from Open to In Progress.

May 22 2023, 5:57 PM · Linked-Open-Data-Network-Program, Wikidata, Foundational Technology Requests

May 12 2023

Isaac added a comment to T321224: Wikidata Item Quality Model.

No updates still with prep for wikiworkshop/hackathon but after next week, hoping to get back to this!

May 12 2023, 8:32 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Apr 11 2023

Isaac added a comment to T321224: Wikidata Item Quality Model.

From discussion with Lydia/Diego:

The concept of completeness feels closer to what we want than quality -- i.e. allowing for more nuance in how many statements are associated with a given item. We came up with a few ideas for how to make assessing item completeness easier (because otherwise it would require very extensive knowledge of a domain area to know how many statements should be associated with an item): I suggested providing the completeness score and quality score and asking the evaluator which was more appropriate but I like Lydia's idea better which was to just provide the completeness score and ask the evaluator if they felt that the actual score was lower, the same, or higher.
Putting together a dataset like this would be fairly straightforward -- the main challenge is having a nice stratified dataset and one that provides information on top of the original quality-oriented dataset. For example, for highly-extensive items, both models tend to agree that the item is A-class so collecting a lot more annotations won't tell us much. It's only for the shorter items where we begin to see discrepancies and so that's where we should probably focus our efforts. Plus because the model is very specific to the instance-of/occupation properties, we should make sure to have a diversity of items by those properties. This is my main TODO.
I read through the paper describing the new proposed Wikidata Property Suggester approach. My understanding of the existing item-completeness/recommender systems:
- Existing Wikidata Property Suggester: make recommendations for properties to add based on statistics on co-occurrence of properties. Ignores values of these properties except for instance-of/subclass-of where the statistics are based on the value. Recommendations are ranked by probability of co-occurrence.
- Recoin: similar to above but only uses instance-of property for determining missing properties and adds in refinement of which occupation the item has if it's a human.
- Proposed Wikidata Property Suggester: more advanced system for finding likely co-occurring properties based on more fine-grained association rules -- i.e. doesn't just merge all the individual "if Property A -> Property B k% of the time" but instead does things like 'if Property A and Property B and ... -> Property N k% of the time". Also takes into account instance-of/subclass-of property values like the existing suggester. This seems like a pretty reasonable enhancement and their approach is quite lightweight (~1.5GB RAM for holding data structure).
I am following the Recoin approach in my model though if the new Property Suggester proves successful and provides the data needed to incorporate into the model (a list of likely missing properties + confidence scores), it would be very reasonable to incorporate that in in place of the Recoin model at a later point and also solve some of the problems that @diego was considering addressing via wikidata embeddings (more nuanced recommendations of missing properties).

Apr 11 2023, 7:50 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Apr 3 2023

leila moved T321224: Wikidata Item Quality Model from FY2022-23-Research-January-March to In Progress on the Research board.

Apr 3 2023, 10:32 PM · Essential-Work, Movement-Insights, Research, Linked-Open-Data-Network-Program, Wikidata

Linked-Open-Data-Network-ProgramComponentArchivedPublicWatch Project

Members

Watchers (1)

Details

Recent ActivityView All

Oct 27 2025

Sep 26 2025

Sep 12 2025

Sep 11 2025

Dec 20 2024

Nov 21 2024

Nov 20 2024

Oct 25 2024

Aug 19 2024

Aug 12 2024

Aug 9 2024

Jul 3 2024

Jun 26 2024

Jun 24 2024

Jun 18 2024

May 3 2024

Apr 29 2024

Oct 7 2023

Aug 3 2023

Jul 28 2023

Jul 25 2023

Jul 24 2023

Jul 21 2023

Jul 12 2023

Jun 30 2023

Jun 23 2023

Jun 20 2023

Jun 16 2023

Jun 6 2023

May 22 2023

May 12 2023

Apr 11 2023

Apr 3 2023

Linked-Open-Data-Network-ProgramComponent
ArchivedPublic
Watch Project

Recent Activity
View All