Hello,
thank you very much for your contributions and comments. I would sign most of your remarks without hesitation.
But I would like to clarify some things again:
- The importance of Wikidata grows with its acceptance by the "unspecialized" audience. This includes also a lot of people who are allowed to decide about project funds or donations. As a rule, they have little time to inform themselves sufficiently about the problems of measuring data quality. In these hectic times, it is unfortunately common for the audience to demand solutions that are as simple and quick to analyse as possible. (I will leave the last sentence here as a hypothesis.) I think it is important to try to meet these expectations. - Recoin is known. And yes, it serves only in connection with the dimension of *relative* completeness. At present, however, it is primarily aimed at people who enter data manually. Thus it remains invisible or unusable for many others. To stick with the idea - would it not be possible to calculate a one- or multi-dimensional value from the recoin information, which then can be stored as a literal via a property "relative completeness" into the item? The advantage would be that this value can be queried via SPARQL together with the item. Possible decision-makers from the field of "jam science" can thus gain an overview of how complete the data from this field are in Wikidata and for which data completion projects funds may still have to be provided. As described in my last article, a single property "relative completeness" is not sufficient to describe data quality. - I am sorry if I expressed it in a misleading way. I use this mailing list to get feedback for an idea. It may be "my" idea (or not), but it is far from being "my" project. However, if the idea should ever be realized by anyone in any way, I would be interested in making my small modest contribution. - It's true that the number of current Wikidata items is hard to imagine. If a single instance would need only one second per item to calculate the different quality scores, it would take about 113 years for all. The fact that many items are modified over and over again and therefore have to be recalculated is not yet taken into account in the calculation. Therefore, the implemented approach would have to use strategies that make the first results visible with less effort. One possibility is to initially concentrate on the part of the data that is being used. We are hear at the question about dynamic quality. - People need support so that they can use data and find and fix their flaws. In the foreseeable future, there will not be so many supporters who will be able to manually check all 60 million items for errors. This is another reason why information about the quality of the data should be queried together with the data.
Thanks
Uwe Jung