Hi Uwe,
I would agree with Gerard's concern about resources. Actually embedding it within Wikidata - stored in the item with a property and queriable by SPARQL - implies that we handle it as a statement. So each edit that materially changed the quality score would prompt another edit to update the scoring. Presumably not all edits would change anything (eg label changes wouldn't be relevant) but even if only 10% made a material difference, that's basically 10% more edits, 10% more contribution to query service updates, etc. And that's quite a substantial chunk of resources for a "nice to have" feature!
So... maybe this suggests a different approach.
You could set up a seperate Wikibase installation (or any other kind of linked-data store) to store the quality ratings, and make that accessible through a federated SPARQL search. The WDQS is capable of handling federated searches reasonably efficiently (see eg https://w.wiki/7at), so you could allow people to do a search using both sets of data ("find me all ABCs on Wikidata, and only return those with a value of X > 0.5 on Wikidata-Scores").
Andrew.
On Tue, 27 Aug 2019 at 20:50, Uwe Jung jung.uwe@gmail.com wrote:
Hello,
many thanks for the answers to my contribution from 24.8. I think that all four opinions contain important things to consider.
@David Abián I have read the article and agree that in the end the users decide which data is good for them or not.
@GerardM It is true that in a possible implementation of the idea, the aspect of computing load must be taken into account right from the beginning.
Please check that I have not given up on the idea yet. With regard to the acceptance of Wikidata, I consider a quality indicator of some kind to be absolutely necessary. There will be a lot of ordinary users who would like to see something like this.
At the same time I completely agree with David;(almost) every chosen indicator is subject to a certain arbitrariness in the selection. There won't be one easy to understand super-indicator. So, let's approach things from the other side. Instead of a global indicator, a separate indicator should be developed for each quality dimension to be considered. With some dimensions this should be relatively easy. For others it could take years until we have agreed on an algorithm for their calculation.
Furthermore, the indicators should not represent discrete values but a continuum of values. No traffic light statements (i.e.: good, medium, bad) should be made. Rather, when displaying the qualifiers, the value could be related to the values of all other objects (e.g. the value x for the current data object in relation to the overall average for all objects for this indicator). The advantage here is that the total average can increase over time, meaning that the position of the value for an individual object can also decrease over time.
Another advantage: Users can define the required quality level themselves. If, for example, you have high demands on accuracy but few demands on the completeness of the statements, you can do this.
However, it remains important that these indicators (i.e. the evaluation of the individual item) must be stored together with the item and can be queried together with the data using SPARQL.
Greetings
Uwe Jung
-- - Andrew Gray andrew@generalist.org.uk