Re: [Wiki-research-l] The basis for Wikidata quality

22 Mar 2017

Hi GerardM

I don’t know if I am one of the researchers you mention in your email, but I have indeed
carried out research around Wikidata quality and still am.
I asked the community to help me gather different point of views over what data quality
means on Wikidata in a RfC a couple of months ago.

If you wonder what happened to that and whether I published anything using that material
without sharing any results with the community, well, the answer is no.
What we collected is still there and still valuable, waiting to be properly analysed once
we have enough time to dedicate more to it (which will happen later on this year).

As for your questions, I agree with you that looking at Wikidata quality with the eyes of
(only one) Wikipedia may not be helpful in understanding its quality and appreciate its
peculiarities. I have tried until now to rely more on more general quality dimensions and
metrics, and on Linked Data-related ones.
This does not mean that the quality criteria previously developed for Wikipedia are not
important. They take into account the collaborative dimension of the project and are
definitely helpful to assess Wikidata as a community product.

A short note in favour of my fellow researcher: an evaluation of Wikidata by item has been
made already by identifying showcase items. Regardless of whether we think that evaluating
quality by item is the most correct approach, I think it is definitely useful to show how
good single units of information (Items) are. It is just ‘a’ measure for quality, not
‘the’ measure for quality. As with all research, it may either prove itself to be an
incredibly valuable contribution to Wikidata (I believe it will) or to be useless after
all. Whatever the outcome, students/researchers working on Wikidata help raise and focus
on issues that are important for the community and for the project itself, imho of
course.

Thanks,
Alessandro

–––
Alessandro Piscopo
Web and Internet Science Group
School of Electronics and Computer Science
University of Southampton
email: A.Piscopo@soton.ac.uk<mailto:A.Piscopo@soton.ac.uk>

On 22 Mar 2017, at 06:27, Gerard Meijssen
<gerard.meijssen@gmail.com<mailto:gerard.meijssen@gmail.com>> wrote:

Hoi,
A student is going to start some work on Wikidata quality based on a model
of quality that is imho seriously suspect. It is item based and assumes
that the more interwiki links there are, the more statements there are and
the more references there are, the item will be of a higher quality.

I did protest against this approach and I did call into question that this
work will help us achieve better quality at Wikidata. I did indicate what
we should do to approach quality at Wikidata and I was indignantly told
that research shows that I am wrong.

The research is about Wikipedia not Wikidata and the paper quoted does not
mention Wikidata at all. As far as I am concerned we have been quite happy
to only see English Wikipedia based research and consequently I doubt there
is Wikimedia based research that is truly applicable.

At a previous time a student started work on a quality project for
Wikidata; comparisons were to be made with external sources so that we
could deduce quality. The student finished his or her research, I assume
wrote a paper and left us with no working functionality. It is left at
that. So the model were a student can do vital work for Wikidata is also
very much in doubt.

I wrote in an e-mail to user:Epochfail:

Hoi,
You refer to a publication, the basis for quality and it is NOT about
Wikidata but about Wikipedia. What is discussed is quality for Wikidata
where other assumptions are needed. My point to data is that its quality is
in the connections that are made.

To some extend Wikidata reflects Wikipedia but not one Wikipedia, all
Wikipedias. In addition there is a large and growing set of data with no
links to Wikipedia or any of the other Wikimedia projects.

When you consider the current dataset, there are hardly any relevant
sources. They do exist by inference - items based on Wikipedia are likely
to have a source - items on an award are documented on the official website
for the award - etc.

Quality is therefore in statements being the same on items that are
identified as such.

When you consider Wikidata, it often has more items relating to a
university, an award than a Wikipedia does and often it does not link to
items representing articles in a specific Wikipedia. When you consider this
alone you have actionable difference of at least 2%.

Sure enough plenty of scope of looking at Wikidata in its own context and
NOT quoting studies that have nothing to do with Wikidata.
Thanks,
    GerardM

My question to both researchers and Wikidata people is: Why would this
Wikipedia model for quality apply to Wikidata? What research is it based
on, is that research applicable and to what extend? Will the alternative
approach to quality for WIKIDATA not provide us with more and better
quality that will also be of relevance to Wikipedia quality?
Thanks,
      GerardM
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org<mailto:Wiki-research-l@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] The basis for Wikidata quality