Hello,
many thanks for the answers to my contribution from 24.8.
I think that all four opinions contain important things to consider.
@David Abián
I have read the article and agree that in the end the users decide which
data is good for them or not.
@GerardM
It is true that in a possible implementation of the idea, the aspect of
computing load must be taken into account right from the beginning.
Please check that I have not given up on the idea yet. With regard to the
acceptance of Wikidata, I consider a quality indicator of some kind to be
absolutely necessary. There will be a lot of ordinary users who would like
to see something like this.
At the same time I completely agree with David;(almost) every chosen
indicator is subject to a certain arbitrariness in the selection. There
won't be one easy to understand super-indicator.
So, let's approach things from the other side. Instead of a global
indicator, a separate indicator should be developed for each quality
dimension to be considered. With some dimensions this should be relatively
easy. For others it could take years until we have agreed on an algorithm
for their calculation.
Furthermore, the indicators should not represent discrete values but a
continuum of values. No traffic light statements (i.e.: good, medium, bad)
should be made. Rather, when displaying the qualifiers, the value could be
related to the values of all other objects (e.g. the value x for the
current data object in relation to the overall average for all objects for
this indicator). The advantage here is that the total average can increase
over time, meaning that the position of the value for an individual object
can also decrease over time.
Another advantage: Users can define the required quality level themselves.
If, for example, you have high demands on accuracy but few demands on the
completeness of the statements, you can do this.
However, it remains important that these indicators (i.e. the evaluation of
the individual item) must be stored together with the item and can be
queried together with the data using SPARQL.
Greetings
Uwe Jung
Am Sa., 24. Aug. 2019 um 13:54 Uhr schrieb Uwe Jung <jung.uwe(a)gmail.com>om>:
Hello,
As the importance of Wikidata increases, so do the demands on the quality
of the data. I would like to put the following proposal up for discussion.
Two basic ideas:
1. Each Wikidata page (item) is scored after each editing. This score
should express different dimensions of data quality in a quickly manageable
way.
2. A property is created via which the item refers to the score value.
Certain qualifiers can be used for a more detailed description (e.g. time
of calculation, algorithm used to calculate the score value, etc.).
The score value can be calculated either within Wikibase after each data
change or "externally" by a bot. For the calculation can be used among
other things: Number of constraints, completeness of references, degree of
completeness in relation to the underlying ontology, etc. There are already
some interesting discussions on the question of data quality which can be
used here ( see
https://www.wikidata.org/wiki/Wikidata:Item_quality;
https://www.wikidata.org/wiki/Wikidata:WikiProject_Data_Quality, etc).
Advantages
- Users get a quick overview of the quality of a page (item).
- SPARQL can be used to query only those items that meet a certain
quality level.
- The idea would probably be relatively easy to implement.
Disadvantage:
- In a way, the data model is abused by generating statements that no
longer describe the item itself, but make statements about the
representation of this item in Wikidata.
- Additional computing power must be provided for the regular
calculation of all changed items.
- Only the quality of pages is referred to. If it is insufficient, the
changes still have to be made manually.
I would now be interested in the following:
1. Is this idea suitable to effectively help solve existing quality
problems?
2. Which quality dimensions should the score value represent?
3. Which quality dimension can be calculated with reasonable effort?
4. How to calculate and represent them?
5. Which is the most suitable way to further discuss and implement
this idea?
Many thanks in advance.
Uwe Jung (UJung <https://www.wikidata.org/wiki/User:UJung>)
www.archivfuehrer-kolonialzeit.de/thesaurus