Hoi, A student is going to start some work on Wikidata quality based on a model of quality that is imho seriously suspect. It is item based and assumes that the more interwiki links there are, the more statements there are and the more references there are, the item will be of a higher quality.
I did protest against this approach and I did call into question that this work will help us achieve better quality at Wikidata. I did indicate what we should do to approach quality at Wikidata and I was indignantly told that research shows that I am wrong.
The research is about Wikipedia not Wikidata and the paper quoted does not mention Wikidata at all. As far as I am concerned we have been quite happy to only see English Wikipedia based research and consequently I doubt there is Wikimedia based research that is truly applicable.
At a previous time a student started work on a quality project for Wikidata; comparisons were to be made with external sources so that we could deduce quality. The student finished his or her research, I assume wrote a paper and left us with no working functionality. It is left at that. So the model were a student can do vital work for Wikidata is also very much in doubt.
I wrote in an e-mail to user:Epochfail:
Hoi, You refer to a publication, the basis for quality and it is NOT about Wikidata but about Wikipedia. What is discussed is quality for Wikidata where other assumptions are needed. My point to data is that its quality is in the connections that are made.
To some extend Wikidata reflects Wikipedia but not one Wikipedia, all Wikipedias. In addition there is a large and growing set of data with no links to Wikipedia or any of the other Wikimedia projects.
When you consider the current dataset, there are hardly any relevant sources. They do exist by inference - items based on Wikipedia are likely to have a source - items on an award are documented on the official website for the award - etc.
Quality is therefore in statements being the same on items that are identified as such.
When you consider Wikidata, it often has more items relating to a university, an award than a Wikipedia does and often it does not link to items representing articles in a specific Wikipedia. When you consider this alone you have actionable difference of at least 2%.
Sure enough plenty of scope of looking at Wikidata in its own context and NOT quoting studies that have nothing to do with Wikidata. Thanks, GerardM
My question to both researchers and Wikidata people is: Why would this Wikipedia model for quality apply to Wikidata? What research is it based on, is that research applicable and to what extend? Will the alternative approach to quality for WIKIDATA not provide us with more and better quality that will also be of relevance to Wikipedia quality? Thanks, GerardM
I am not a fan of quality scales for Wikipedia -- and from what Gerard describes doesn't sound generally impactful or helpful as it relates to Wikidata.
As someone more firmly in the Wikipedia camp, I think that my perception of a well done Wikipedia page -- which would mean beautifully curated citations and as complete metadata as possible (infoboxes, works & publications, further reading, authority control, etc.) -- is very very different than a perception of a well done Wikidata entry. They seem to be very separate beasts entirely.
I also think it is probable that due to the youth of the Wikidata project that research about Wikidata might be needing to find its space and focus. To conflate Wikipedia methodology and evaluation systems with Wikidata is in my opinion not going to work because they are so different. And when it comes to notability especially Wikidata is very different to constraints on Wikipedia. The two projects obviously work in service to and with each other but it seems clear that different evaluation systems would be called for.
I'm a nascent Wikidatan but am learning and using Wikidata more and more. So I probably don't understand the concerns here fully. But I do understand Wikipedia -- and how it can relate and interface with Wikidata. And notability and quality of information is a constant concern. So....
- Erika
On Mar 22, 2017, at 2:27 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, A student is going to start some work on Wikidata quality based on a model of quality that is imho seriously suspect. It is item based and assumes that the more interwiki links there are, the more statements there are and the more references there are, the item will be of a higher quality.
I did protest against this approach and I did call into question that this work will help us achieve better quality at Wikidata. I did indicate what we should do to approach quality at Wikidata and I was indignantly told that research shows that I am wrong.
The research is about Wikipedia not Wikidata and the paper quoted does not mention Wikidata at all. As far as I am concerned we have been quite happy to only see English Wikipedia based research and consequently I doubt there is Wikimedia based research that is truly applicable.
At a previous time a student started work on a quality project for Wikidata; comparisons were to be made with external sources so that we could deduce quality. The student finished his or her research, I assume wrote a paper and left us with no working functionality. It is left at that. So the model were a student can do vital work for Wikidata is also very much in doubt.
I wrote in an e-mail to user:Epochfail: Hoi, You refer to a publication, the basis for quality and it is NOT about Wikidata but about Wikipedia. What is discussed is quality for Wikidata where other assumptions are needed. My point to data is that its quality is in the connections that are made.
To some extend Wikidata reflects Wikipedia but not one Wikipedia, all Wikipedias. In addition there is a large and growing set of data with no links to Wikipedia or any of the other Wikimedia projects.
When you consider the current dataset, there are hardly any relevant sources. They do exist by inference - items based on Wikipedia are likely to have a source - items on an award are documented on the official website for the award - etc.
Quality is therefore in statements being the same on items that are identified as such.
When you consider Wikidata, it often has more items relating to a university, an award than a Wikipedia does and often it does not link to items representing articles in a specific Wikipedia. When you consider this alone you have actionable difference of at least 2%.
Sure enough plenty of scope of looking at Wikidata in its own context and NOT quoting studies that have nothing to do with Wikidata. Thanks, GerardM
My question to both researchers and Wikidata people is: Why would this Wikipedia model for quality apply to Wikidata? What research is it based on, is that research applicable and to what extend? Will the alternative approach to quality for WIKIDATA not provide us with more and better quality that will also be of relevance to Wikipedia quality? Thanks, GerardM _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Sitelinks to an item are an approximation of the number of views of the data from an item, and as such gives an approximation to the likelihood of detecting an error. Few views imply a larger time span before an error is detected. It is really about estimating quality as a function of the age of the item as number of page views, but approximated through sitelinks.
Problem is, the number of sitelinks is not a good approximation. Yes it is a simple approximation, but it is still pretty bad.
References are an other way to verify the data, but that is not a valid argument against measuring the age of the data.
I've been toying with an idea for some time that use statistical inference to try to identify questionable facts, but it will probably not be done - it is way to much work to do in spare time.
Forgot to mention; this is not really about _quality_, Gerrard says model of quality, it is about _trust_ and _reputation_. Something can have low quality and high trust, ref cheap cell phones, and the reputation might not reflect the actual quality.
You (usually) measure reputation and calculate trust, but I have seen it the other way around. The end result is the same anyhow.
On Wed, Mar 22, 2017 at 3:31 PM, John Erling Blad jeblad@gmail.com wrote:
Sitelinks to an item are an approximation of the number of views of the data from an item, and as such gives an approximation to the likelihood of detecting an error. Few views imply a larger time span before an error is detected. It is really about estimating quality as a function of the age of the item as number of page views, but approximated through sitelinks.
Problem is, the number of sitelinks is not a good approximation. Yes it is a simple approximation, but it is still pretty bad.
References are an other way to verify the data, but that is not a valid argument against measuring the age of the data.
I've been toying with an idea for some time that use statistical inference to try to identify questionable facts, but it will probably not be done - it is way to much work to do in spare time.