Re: [Wikidata] Proposal for the introduction of a practicable Data Quality Indicator in Wikidata (next round)

28 Aug 2019


      Hi Uwe,
I would agree with Gerard's concern about resources. Actually
embedding it within Wikidata - stored in the item with a property and
queriable by SPARQL - implies that we handle it as a statement. So
each edit that materially changed the quality score would prompt
another edit to update the scoring. Presumably not all edits would
change anything (eg label changes wouldn't be relevant) but even if
only 10% made a material difference, that's basically 10% more edits,
10% more contribution to query service updates, etc. And that's quite
a substantial chunk of resources for a "nice to have" feature!
So... maybe this suggests a different approach.
You could set up a seperate Wikibase installation (or any other kind
of linked-data store) to store the quality ratings, and make that
accessible through a federated SPARQL search. The WDQS is capable of
handling federated searches reasonably efficiently (see eg
https://w.wiki/7at), so you could allow people to do a search using
both sets of data ("find me all ABCs on Wikidata, and only return
those with a value of X > 0.5 on Wikidata-Scores").
Andrew.
On Tue, 27 Aug 2019 at 20:50, Uwe Jung jung.uwe@gmail.com wrote:
...
Hello,
many thanks for the answers to my contribution from 24.8.
I think that all four opinions contain important things to consider.
@David Abián
I have read the article and agree that in the end the users decide which data is good for them or not.
@GerardM
It is true that in a possible implementation of the idea, the aspect of computing load must be taken into account right from the beginning.
Please check that I have not given up on the idea yet. With regard to the acceptance of Wikidata, I consider a quality indicator of some kind to be absolutely necessary. There will be a lot of ordinary users who would like to see something like this.
At the same time I completely agree with David;(almost) every chosen indicator is subject to a certain arbitrariness in the selection. There won't be one easy to understand super-indicator.
So, let's approach things from the other side. Instead of a global indicator, a separate indicator should be developed for each quality dimension to be considered. With some dimensions this should be relatively easy. For others it could take years until we have agreed on an algorithm for their calculation.
Furthermore, the indicators should not represent discrete values but a continuum of values. No traffic light statements (i.e.: good, medium, bad) should be made. Rather, when displaying the qualifiers, the value could be related to the values of all other objects (e.g. the value x for the current data object in relation to the overall average for all objects for this indicator). The advantage here is that the total average can increase over time, meaning that the position of the value for an individual object can also decrease over time.
Another advantage: Users can define the required quality level themselves. If, for example, you have high demands on accuracy but few demands on the completeness of the statements, you can do this.
However, it remains important that these indicators (i.e. the evaluation of the individual item) must be stored together with the item and can be queried together with the data using SPARQL.
Greetings
Uwe Jung
--
- Andrew Gray
  andrew@generalist.org.uk

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Proposal for the introduction of a practicable Data Quality Indicator in Wikidata (next round)