Better Item quality judgments from ORES - Wikidata

29 Oct 2020

Hello all,

Wikidata’s content is growing and our data is used in more and more
high-profile places. This means the pressure around data quality is rising.
We want to provide people with good data. One important piece in the data
quality puzzle is being able to *understand where we currently stand
quality-wise and how that changes over time*. We need to be able to do this
at scale and in an automated and repeatable way because no-one of us wants
to do this by hand for 90 Million Items for sure.

That’s where ORES <https://www.mediawiki.org/wiki/ORES>, the machine
learning system, comes in. One of the things it can do is judge the quality
of an Item. Or to be more exact it can judge some aspects of the quality of
an Item. It puts each Item into a quality class between A (amazing) and E
(ewwww, terrible). It’s been doing this for a while already but the quality
judgments it provided were not very good. The reasons for this were that it
took only a limited number of signals into account (that’d be something
like the number of References on the Item or the number of Labels) and
because it was trained on rather old data. Since then Wikidata’s data has
changed a lot so ORES could not tell what to do with the new kinds of Items
like astronomical objects because it had never seen them before.

We wanted to improve that and *make the quality judgments ORES provides
better*. We did this by:

   - adding a number of new signals (e.g. does this Item have an image
   attached)
   - changing existing signals (e.g. missing references on external ID
   statements no longer punish the Item so much)
   - retraining the model on more current data so it better understands
   scientific papers, astronomical objects, etc.

While we were at it we also wanted to better understand how data quality
changes over time on Wikidata. Before we only looked at the global average
quality score. But how do Items change over time? How many Items are being
improved from D to C or even B class for example? To better understand this
we started creating diagrams like this one
<https://commons.wikimedia.org/wiki/File:Wikidata_quality_diagram,_January_2019_to_January_2020.png>.
It shows the development from January 2019 to January 2020.

We’re happy to present these improvements for Wikidata birthday
<https://www.wikidata.org/wiki/Wikidata:Eighth_Birthday/Presents>, and we
hope this will help us get a better and more accurate view of the data
quality on Wikidata now.

If you want to see the quality score near the header on each Item you can
include the following user script in your Common.js
<https://www.wikidata.org/wiki/Special:MyPage/common.js> page:
importScript("User:EpochFail/ArticleQuality.js")

*What’s coming next on the same topic?*

   - ORES can’t judge all aspects of quality. It for example can not tell
   if a statement is generally considered true. We will look at ways of
   judging this aspect of quality as well but it’s considerably harder. If you
   have ideas how to go about it let us know.
   - We will build a small tool that’ll make it possible for you to provide
   a list of Items and then get the quality of that subset of Wikidata as well
   as the lowest and highest quality Items. This will hopefully help wiki
   projects etc to have a good overview of their data.

If you have any questions or feedback, or want to keep discussing Item
quality, feel free to use this talk page
<https://www.wikidata.org/wiki/Wikidata_talk:Item_quality>. Cheers,
-- 
Léa Lacroix
Community Engagement Coordinator

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.