Hoi, Thank you for another approach. When Wikidata imports data from Wikipedia, it essentially stands on the shoulders of giants. Yes, there are sources in Wikipedia and it does not prevent occasional issues. Yes, we import a lot of data from Wikipedia and this makes life at Wikidata easy and what we do obvious. It all started with improving quality at Wikidata by making interwiki links manageable and we are still often involved in fixing wikilinks in Wikidata because the assumptions to link some articles are "funny".
When you look at Wikipedia, a lot of the fixtures are essentially about data. A category or a list can be replicated in many ways by querying Wikidata.The inverse is that Wikidata can be populated from Wikipedia. Consequently when we say that we know about men and women in so many Wikipedias it is because of this that we can and do. When Wikipedia is correct, Wikidata is. When Wikipedias do not agree, you will find this expressed in Wikidata.
When people build tools, bots and they have done so for a long time it is EXACTLY based on the assumption that Wikipedia is essentially correct and, it is why the quality and quantity of Wikidata is already this good. When you want to consider Wikidata and its complexity, it is important to look at the statistics. The statistics by Magnus are the most relevant because they help explain many of the issues of Wikidata.
One important point. No Wikipedia can claim Wikidata as it is a composite. Wikipedia policies do not apply. When people insist that all the data in Wikidata has to be 100% correct, forget it. Wikipedia is not 00% correct either and that is what we build upon. It has never been this way and it is impossible to do this any time soon.
What we can do is build upon existing qualities, compare and curate. It is for instance fairly easy to improve on Wikipedia based upon the information that is already there but shown to be problematic. It is easy when we collaborate as it will improve the quality of what we offer. One problem is that we are SO bad at collaboration. Wikipedians work on one article at a time and when I work on awards there are easily 60 persons involved and I trust Wikipedia to be right. The kind of issues I encounter I blog about regularly. I am not involved in single items or they have to be of relevance to me like Bassel, the only Wikipedian sentenced to death. So I did add new items that exist as red links in the award he received and I did ask Magnus to help me with a list for the award he received. I added the website I used on the award and that is as far as I go.
When you want to talk about the issues, what is it that you want to achieve. So far there has been little interest in Wikidata. When you want to learn about issues, research the issues. Find methods to calculate the error rate, find methods to compare Wikidata with the Wikipedias and with other sources in a meaningful way. But do approach it like Magnus does. His contributions help us make a positive difference. When you find numbers for now that you cannot replicate with the next dump and the next, they are essentially without much value because they do not enable us to improve on what we have. They do not help us engage our minds to make a difference. I ask Amir regularly to run a bot based on the statistics produced by Magnus, we are not at the stage where we have such tasks automated...
Andrea, Wikidata is a wiki. It is young and it has already proven itself for several applications. What can be done improves as our data improves. We have a lack of data on many subjects because it is where Wikipedia is lacking. How will we approach for instance the fact that we have fewer than 1000 Syrians and one of them is an emperor of the Roman empire and another is Bassel?
Let us be bold and allow us to be a wiki. Let us work towards the quality that is possible to achieve and do not burden us with the assumptions of some Wikipedias. When you are serious, get involved. Thanks, GerardM
On 13 December 2015 at 19:10, Andrea Zanni zanni.andrea84@gmail.com wrote:
I really feel we are drowning in a glass of water. The issue of "data quality" or "reliability" that Andreas raises is well known: what I don't understand if the "scale" of it is much bigger on Wikidata than Wikipedia, and if this different scale makes it much more important. The scale of the issue is maybe something worth discussing, and not the issue itself? Is the fact that Wikidata is centralised different from statements on Wikipedia? I don't know, but to me this is a more neutral and interesting question.
I often say that the Wikimedia world made quality an "heisemberghian" feature: you always have to check if it's there. The point is: it's been always like this. We always had to check for quality, even when we used Britannica or authority controls or whatever "reliable" sources we wanted. Wikipedia, and now Wikidata, is made for everyone to contribute, it's open and honest in being open, vulnerable, prone to errors. But we are transparent, we say that in advance, we can claim any statement to the smallest detail. Of course it's difficult, but we can do it. Wikidata, as Lydia said, can actually have conflicting statements in every item: we "just" have to put them there, as we did to Wikipedia.
If Google uses our data and they are wrong, that's bad for them. If they correct the errors and do not give us the corrections, that's bad for us and not ethical from them. The point is: there is no license (for what I know) that can force them to contribute to Wikidata. That is, IMHO, the problem with "over-the-top" actors: they can harness collective intelligent and "not give back." Even with CC-BY-SA, they could store (as they are probably already doing) all the data in their knowledge vault, which is secret as it is an incredible asset for them.
I'd be happy to insert a new clause of "forced transparency" in CC-BY-SA or CC0, but it's not there.
So, as we are working via GLAMs with Wikipedia for getting reliable sources and content, we are working with them also for good statements and data. Putting good data in Wikidata makes it better, and I don't understand what is the problem here (I understand, again, the issue of putting too much data and still having a small community). For example: if we are importing different reliable databases, andthe institutions behind them find it useful and helpful to have an aggregator of identifiers and authority controls, what is the issue? There is value in aggregating data, because you can spot errors and inconsistencies. It's not easy, of course, to find a good workflow, but, again, that is *another* problem.
So, in conclusion: I find many issues in Wikidata, but not on the mission/vision, just in the complexity of the project, the size of the dataset, the size of the community.
Can we talk about those?
Aubrey
On Sun, Dec 13, 2015 at 6:40 PM, Andreas Kolbe jayen466@gmail.com wrote:
On Sun, Dec 13, 2015 at 5:32 PM, geni geniice@gmail.com wrote:
On 13 December 2015 at 15:57, Andreas Kolbe jayen466@gmail.com
wrote:
Jane,
The issue is that you can't cite one Wikipedia article as a source in another.
However you can within the same article per [[WP:LEAD]].
Well, of course, if there are reliable sources cited in the body of the article that back up the statements made in the lead. You still need to cite a reliable source though; that's Wikipedia 101. _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe