Re: [Wiki-research-l] [Wikidata] The basis for Wikidata quality

22 Mar 2017


      Hoi,
You are conflating two things that are not related. ORES is really helpful
and there is plenty of room for it to function extremely well on Wikidata.
Yes, ORES will do good things for Wikidata but it is separated from the
proposed item quality.
When an item is created because of the existence on the Kyrgyz Wikipedia,
it may not even have a label. This happens a lot and Amir, the developer of
ORES has a bot that he regularly runs to add one label. The name of the
article. When an article is added the same bot does the same work. We do
not have to rate it for a specific "level of quality" when we are to
improve our quality, this bot runs every week automatically.  When an
automated rating system sees this, it can perform the remedy..
When as I understand from Lydia, the ratings are done automagically, in an
ORES kinda way, it gains relevance. The point though is that still the
quality of all the individual items on their own is of hardly any
significance. It becomes relevant when it can accept the result of a query
and provide results on the basis of that.. For instance.. all the items
with "catalog" "black lunch table" and give me a rating for all the
articles with no articles or no articles in Spanish...
The problem with rating "item quality" is that on its own, it has no
application. I just finished adding award winners [1]. Based on the quality
of the English article, I added the award winners using "linked items" and
"petscan". I added a few items because as there are no articles. The
Turkish article for the award has plenty of red links. It would be a
quality improvement when these red links are associated with the items. An
article writer immediately finds the English article and the Wikidata
statements. That is actionable quality as it provides a way to stabilise
both Wikidata, en.wp and tr.wp. For the Turkish language the labels of the
red links may be used. When you consider quality, it is like an onion. On
the first level all the information is there, on the second level, we may
be missing education (that may still be in Freebase).. an employer.
whatever makes sense in a context
When there is not even a red link but a text like for English, a tool like
ORES could recognise the label in the text and accept it as a result that
is probably positive.
When we are really interested in quality, we need to compare the content of
the many projects including Wikidata and find that it is in balance.
My key criticism stands of the current quality standard: this is a first
step but there are severe doubt of the relevance of several aspects. As a
first iteration it will prove what is good and where it needs improvement.
But without an interface into query it is useless.
Thanks,
      GerardM
[1] https://tools.wmflabs.org/reasonator/?q=Q13582570
On 22 March 2017 at 15:33, Aaron Halfaker aaron.halfaker@gmail.com wrote:
...
Hey wiki-research-l folks,
Gerard didn't actually link you to the quality criteria he takes issue
with.  See https://www.wikidata.org/wiki/Wikidata:Item_quality  I think
Gerard's argument basically boils down to Wikidata != Wikipedia, but it's
unclear how that is relevant to the goal of measuring the quality of
items.  This is something I've been talking to Lydia about for a long
time.  It's been great for the few Wikis where we have models deployed in
ORES[1] (English, French, and Russian Wikipedia).  So we'd like to have the
same for Wikidata.   As Lydia said, we do all sorts of fascinating things
with a model like this.  Honestly, I think the criteria is coming together
quite nicely and we're just starting a pilot labeling campaign to work
through a set of issues before starting the primary labeling drive.

https://ores.wikimedia.org

-Aaron
On Wed, Mar 22, 2017 at 6:39 AM, Gerard Meijssen <
gerard.meijssen@gmail.com>
wrote:
...
Hoi,
What I have read is that it will be individual items that are graded.
That
...
is not what helps you determine what items are lacking in something. When
you want to determine if something is lacking you need a relational
approach. When you approach a award like this one [1], it was added to
make
...
the award for a person [2] more complete. No real importance is given to
this award, just a few more people were added because they are part of a
group that gets more attention from me [3]. For yet another award [4], I
added all the people who received the award because I was told by
someone's
...
expert opinion that they were all notable (in the Wikipedia sense of the
word). I added several of these people in Wikidata. Arguably, the
Wikidata
...
the quality for the item for the award is great but it has no article
associated to it in Wikipedia but that has nothing to do with the quality
of the information it provides. It is easy and obvious to recognise in
one
...
level deeper that quality issues arise; the info for several people is
meagre at best.You cannot deny their relevance though; removing them
destroys the quality for the award.
The point is that in relations you can describe quality, in the grading
that is proposed there is nothing really that is actionable.
When you add links to the mix, these same links have no bearing on the
quality of the Wikidata item. Why would it? Links only become interesting
when you compare the statements in Wikidata with the links to other
articles in the same Wikipedia. This is not what this approach brings.
Really, how will the grades to items make a difference. How will it help
us
...
understand that "items relating to railroads are lacking"? It does not.
When you want to have indicators for quality; here is one.. an author
(and
...
its subclasses) should have a VIAF identifier. An artist with objects in
the Getty Museum should have an ULAN number. The lack of such information
is actionable. The number of interwiki links is not, the number of
statements are not and even references are not that convincing.
Thanks,
      GerardM
[1] https://tools.wmflabs.org/reasonator/?&q=29000734
[2] https://tools.wmflabs.org/reasonator/?&q=7315382
[3] https://tools.wmflabs.org/reasonator/?&q=3308284
[4] https://tools.wmflabs.org/reasonator/?&q=28934266
On 22 March 2017 at 11:56, Lydia Pintscher <lydia.pintscher@wikimedia.de
wrote:
...
On Wed, Mar 22, 2017 at 10:03 AM, Gerard Meijssen
gerard.meijssen@gmail.com wrote:
...
In your reply I find little argument why this approach is useful. I
do
...
...
not
...
find a result that is actionable. There is little point to this
approach
...
...
and it does not fit with well with much of the Wikidata practice.
Gerard, the outcome will be very actionable. We will have the
groundwork needed to identify individual items and sets of items that
need improvement. If it for example turns out that our items related
to railroads are particularly lacking then that is something we can
concentrate on if we so chose. We can do editathons, data
partnerships, quality drives and and and.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] [Wikidata] The basis for Wikidata quality