New subject: [Wiki-research-l] The basis for Wikidata quality

22 Mar 2017

Hoi,
When you consider the "collaroborative dimension", it is utterly different
for Wikidata. An example: I just added a few statements to Dorothy Tarrant
[1].For several of those statements I added hundreds of similar statements
on other items. In order to add the award I had to add the award first. I
updated the date of birth and death. When you consider the statements on
most items, they are typically done by bot or by processes like I use. This
"collaborative dimension" is relevant to Wikipedia, not to Wikidata.

You state that previously developed criteria for Wikipedia are important.
Ok, how?

My problem with this approach is that it establishes Wikipedia think that
is not appropriate for Wikidata. You are imho correct where you say that
links are of more relevance. They are because they allow to compare links
between Wikipedia articles and links between Wikidata items. This is
actionable information because the two should largely be the same. I have
described [2] how Wikidata can help in improving the quality of any
Wikipedia and in the process improve its own quality. This can be done by
associating wiki links and red links with Wikidata items. Tools can be of
service in pointing out probable issues.

In your reply I find little argument why this approach is useful. I do not
find a result that is actionable. There is little point to this approach
and it does not fit with well with much of the Wikidata practice.
Thanks,
      GerardM

[1] https://tools.wmflabs.org/reasonator/?q=Q18783615
[2]
http://ultimategerardm.blogspot.nl/2016/01/wikipedia-lowest-hanging-fruit-f…

On 22 March 2017 at 09:45, Piscopo A. &lt;A.Piscopo(a)soton.ac.uk&gt; wrote:

...
  Hi GerardM

 I don’t know if I am one of the researchers you mention in your email, but
 I have indeed carried out research around Wikidata quality and still am.
 I asked the community to help me gather different point of views over what
 data quality means on Wikidata in a RfC a couple of months ago.

 If you wonder what happened to that and whether I published anything using
 that material without sharing any results with the community, well, the
 answer is no.
 What we collected is still there and still valuable, waiting to be
 properly analysed once we have enough time to dedicate more to it (which
 will happen later on this year).

 As for your questions, I agree with you that looking at Wikidata quality
 with the eyes of (only one) Wikipedia may not be helpful in understanding
 its quality and appreciate its peculiarities. I have tried until now to
 rely more on more general quality dimensions and metrics, and on Linked
 Data-related ones.
 This does not mean that the quality criteria previously developed for
 Wikipedia are not important. They take into account the collaborative
 dimension of the project and are definitely helpful to assess Wikidata as a
 community product.

 A short note in favour of my fellow researcher: an evaluation of Wikidata
 by item has been made already by identifying showcase items. Regardless of
 whether we think that evaluating quality by item is the most correct
 approach, I think it is definitely useful to show how good single units of
 information (Items) are. It is just ‘a’ measure for quality, not ‘the’
 measure for quality. As with all research, it may either prove itself to be
 an incredibly valuable contribution to Wikidata (I believe it will) or to
 be useless after all. Whatever the outcome, students/researchers working on
 Wikidata help raise and focus on issues that are important for the
 community and for the project itself, imho of course.

 Thanks,
 Alessandro

 –––
 Alessandro Piscopo
 Web and Internet Science Group
 School of Electronics and Computer Science
 University of Southampton
 email: A.Piscopo@soton.ac.uk<mailto:A.Piscopo@soton.ac.uk>

 On 22 Mar 2017, at 06:27, Gerard Meijssen &lt;gerard.meijssen(a)gmail.com&lt;
 mailto:gerard.meijssen@gmail.com>> wrote:

 Hoi,
 A student is going to start some work on Wikidata quality based on a model
 of quality that is imho seriously suspect. It is item based and assumes
 that the more interwiki links there are, the more statements there are and
 the more references there are, the item will be of a higher quality.

 I did protest against this approach and I did call into question that this
 work will help us achieve better quality at Wikidata. I did indicate what
 we should do to approach quality at Wikidata and I was indignantly told
 that research shows that I am wrong.

 The research is about Wikipedia not Wikidata and the paper quoted does not
 mention Wikidata at all. As far as I am concerned we have been quite happy
 to only see English Wikipedia based research and consequently I doubt there
 is Wikimedia based research that is truly applicable.

 At a previous time a student started work on a quality project for
 Wikidata; comparisons were to be made with external sources so that we
 could deduce quality. The student finished his or her research, I assume
 wrote a paper and left us with no working functionality. It is left at
 that. So the model were a student can do vital work for Wikidata is also
 very much in doubt.

 I wrote in an e-mail to user:Epochfail:

 Hoi,
 You refer to a publication, the basis for quality and it is NOT about
 Wikidata but about Wikipedia. What is discussed is quality for Wikidata
 where other assumptions are needed. My point to data is that its quality is
 in the connections that are made.

 To some extend Wikidata reflects Wikipedia but not one Wikipedia, all
 Wikipedias. In addition there is a large and growing set of data with no
 links to Wikipedia or any of the other Wikimedia projects.

 When you consider the current dataset, there are hardly any relevant
 sources. They do exist by inference - items based on Wikipedia are likely
 to have a source - items on an award are documented on the official website
 for the award - etc.

 Quality is therefore in statements being the same on items that are
 identified as such.

 When you consider Wikidata, it often has more items relating to a
 university, an award than a Wikipedia does and often it does not link to
 items representing articles in a specific Wikipedia. When you consider this
 alone you have actionable difference of at least 2%.

 Sure enough plenty of scope of looking at Wikidata in its own context and
 NOT quoting studies that have nothing to do with Wikidata.
 Thanks,
     GerardM

 My question to both researchers and Wikidata people is: Why would this
 Wikipedia model for quality apply to Wikidata? What research is it based
 on, is that research applicable and to what extend? Will the alternative
 approach to quality for WIKIDATA not provide us with more and better
 quality that will also be of relevance to Wikipedia quality?
 Thanks,
       GerardM
 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org<mailto:Wiki-
 research-l(a)lists.wikimedia.org&gt;
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Re: [Wikidata] [Wiki-research-l] The basis for Wikidata quality