Re: [Wikidata] Proposal for the introduction of a practicable Data Quality Indicator in Wikidata (next round)

28 Aug 2019

Hi Uwe,

I would agree with Gerard's concern about resources. Actually
embedding it within Wikidata - stored in the item with a property and
queriable by SPARQL - implies that we handle it as a statement. So
each edit that materially changed the quality score would prompt
another edit to update the scoring. Presumably not all edits would
change anything (eg label changes wouldn't be relevant) but even if
only 10% made a material difference, that's basically 10% more edits,
10% more contribution to query service updates, etc. And that's quite
a substantial chunk of resources for a "nice to have" feature!

So... maybe this suggests a different approach.

You could set up a seperate Wikibase installation (or any other kind
of linked-data store) to store the quality ratings, and make that
accessible through a federated SPARQL search. The WDQS is capable of
handling federated searches reasonably efficiently (see eg
https://w.wiki/7at), so you could allow people to do a search using
both sets of data ("find me all ABCs on Wikidata, and only return
those with a value of X > 0.5 on Wikidata-Scores").

Andrew.

On Tue, 27 Aug 2019 at 20:50, Uwe Jung &lt;jung.uwe(a)gmail.com&gt; wrote:
...

 Hello,

 many thanks for the answers to my contribution from 24.8.
 I think that all four opinions contain important things to consider.

 @David Abián
 I have read the article and agree that in the end the users decide which data is good for
them or not.

 @GerardM
 It is true that in a possible implementation of the idea, the aspect of computing load
must be taken into account right from the beginning.

 Please check that I have not given up on the idea yet. With regard to the acceptance of
Wikidata, I consider a quality indicator of some kind to be absolutely necessary. There
will be a lot of ordinary users who would like to see something like this.

 At the same time I completely agree with David;(almost) every chosen indicator is subject
to a certain arbitrariness in the selection. There won't be one easy to understand
super-indicator.
 So, let's approach things from the other side. Instead of a global indicator, a
separate indicator should be developed for each quality dimension to be considered. With
some dimensions this should be relatively easy. For others it could take years until we
have agreed on an algorithm for their calculation.

 Furthermore, the indicators should not represent discrete values but a continuum of
values. No traffic light statements (i.e.: good, medium, bad) should be made. Rather, when
displaying the qualifiers, the value could be related to the values of all other objects
(e.g. the value x for the current data object in relation to the overall average for all
objects for this indicator). The advantage here is that the total average can increase
over time, meaning that the position of the value for an individual object can also
decrease over time.

 Another advantage: Users can define the required quality level themselves. If, for
example, you have high demands on accuracy but few demands on the completeness of the
statements, you can do this.

 However, it remains important that these indicators (i.e. the evaluation of the
individual item) must be stored together with the item and can be queried together with
the data using SPARQL.

 Greetings

 Uwe Jung

--
- Andrew Gray
  andrew(a)generalist.org.uk

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Proposal for the introduction of a practicable Data Quality Indicator in Wikidata (next round)