Wikidata’s content is growing and our data is used in more and more high-profile places. This means the pressure around data quality is rising. We want to provide people with good data. One important piece in the data quality puzzle is being able to understand where we currently stand quality-wise and how that changes over time. We need to be able to do this at scale and in an automated and repeatable way because no-one of us wants to do this by hand for 90 Million Items for sure.
That’s where ORES, the machine learning system, comes in. One of the things it can do is judge the quality of an Item. Or to be more exact it can judge some aspects of the quality of an Item. It puts each Item into a quality class between A (amazing) and E (ewwww, terrible). It’s been doing this for a while already but the quality judgments it provided were not very good. The reasons for this were that it took only a limited number of signals into account (that’d be something like the number of References on the Item or the number of Labels) and because it was trained on rather old data. Since then Wikidata’s data has changed a lot so ORES could not tell what to do with the new kinds of Items like astronomical objects because it had never seen them before.
We wanted to improve that and make the quality judgments ORES provides better. We did this by:
While we were at it we also wanted to better understand how data quality changes over time on Wikidata. Before we only looked at the global average quality score. But how do Items change over time? How many Items are being improved from D to C or even B class for example? To better understand this we started creating diagrams like this one. It shows the development from January 2019 to January 2020.
We’re happy to present these improvements for Wikidata birthday, and we hope this will help us get a better and more accurate view of the data quality on Wikidata now.
If you want to see the quality score near the header on each Item you can include the following user script in your Common.js page:
What’s coming next on the same topic?
If you have any questions or feedback, or want to keep discussing Item quality, feel free to use this talk page. Cheers,--