Re: [Wikimedia-l] Quality issues

14 Dec 2015


      Hoi,
Thank you for another approach. When Wikidata imports data from Wikipedia,
it essentially stands on the shoulders of giants. Yes, there are sources in
Wikipedia and it does not prevent occasional issues. Yes, we import a lot
of data from Wikipedia and this makes life at Wikidata easy and what we do
obvious. It all started with improving quality at Wikidata by making
interwiki links manageable and we are still often involved in fixing
wikilinks in Wikidata because the assumptions to link some articles are
"funny".
When you look at Wikipedia, a lot of the fixtures are essentially about
data. A category or a list can be replicated in many ways by querying
Wikidata.The inverse is that Wikidata can be populated from Wikipedia.
Consequently when we say that we know about men and women in so many
Wikipedias it is because of this that we can and do. When Wikipedia is
correct, Wikidata is. When Wikipedias do not agree, you will find this
expressed in Wikidata.
When people build tools, bots and they have done so for a long time it is
EXACTLY based on the assumption that Wikipedia is essentially correct and,
it is why the quality and quantity of Wikidata is already this good. When
you want to consider Wikidata and its complexity, it is important to look
at the statistics. The statistics by Magnus are the most relevant because
they help explain many of the issues of Wikidata.
One important point. No Wikipedia can claim Wikidata as it is a composite.
Wikipedia policies do not apply. When people insist that all the data in
Wikidata has to be 100% correct, forget it. Wikipedia is not 00% correct
either and that is what we build upon. It has never been this way and it is
impossible to do this any time soon.
What we can do is build upon existing qualities, compare and curate. It is
for instance fairly easy to improve on Wikipedia based upon the information
that is already there but shown to be problematic. It is easy when we
collaborate as it will improve the quality of what we offer. One problem is
that we are SO bad at collaboration. Wikipedians work on one article at a
time and when I work on awards there are easily 60 persons involved and I
trust Wikipedia to be right. The kind of issues I encounter I blog about
regularly. I am not involved in single items or they have to be of
relevance to me like Bassel, the only Wikipedian sentenced to death. So I
did add new items that exist as red links in the award he received and I
did ask Magnus to help me with a list for the award he received. I added
the website I used on the award and that is as far as I go.
When you want to talk about the issues, what is it that you want to
achieve. So far there has been little interest in Wikidata. When you want
to learn about issues, research the issues. Find methods to calculate the
error rate, find methods to compare Wikidata with the Wikipedias and with
other sources in a meaningful way. But do approach it like Magnus does. His
contributions help us make a positive difference. When you find numbers for
now that you cannot replicate with the next dump and the next, they are
essentially without much value because they do not enable us to improve on
what we have. They do not help us engage our minds to make a difference. I
ask Amir regularly to run a bot based on the statistics produced by Magnus,
we are not at the stage where we have such tasks automated...
Andrea, Wikidata is a wiki. It is young and it has already proven itself
for several applications. What can be done improves as our data improves.
We have a lack of data on many subjects because it is where Wikipedia is
lacking. How will we approach for instance the fact that we have fewer than
1000 Syrians and one of them is an emperor of the Roman empire and another
is Bassel?
Let us be bold and allow us to be a wiki. Let us work towards the quality
that is possible to achieve and do not burden us with the assumptions of
some Wikipedias. When you are serious, get involved.
Thanks,
      GerardM
On 13 December 2015 at 19:10, Andrea Zanni zanni.andrea84@gmail.com wrote:
...
I really feel we are drowning in a glass of water.
The issue of "data quality" or "reliability" that Andreas raises is well
known:
what I don't understand if the "scale" of it is much bigger on Wikidata
than Wikipedia,
and if this different scale makes it much more important. The scale of the
issue is maybe something worth discussing, and not the issue itself? Is the
fact that Wikidata is centralised different from statements on Wikipedia? I
don't know, but to me this is a more neutral and interesting question.
I often say that the Wikimedia world made quality an "heisemberghian"
feature: you always have to check if it's there.
The point is: it's been always like this.
We always had to check for quality, even when we used Britannica or
authority controls or whatever "reliable" sources we wanted. Wikipedia, and
now Wikidata, is made for everyone to contribute, it's open and honest in
being open, vulnerable, prone to errors. But we are transparent, we say
that in advance,  we can claim any statement to the smallest detail. Of
course it's difficult, but we can do it. Wikidata, as Lydia said, can
actually have conflicting statements in every item: we "just" have to put
them there, as we did to Wikipedia.
If Google uses our data and they are wrong, that's bad for them. If they
correct the errors and do not give us the corrections, that's bad for us
and not ethical from them. The point is: there is no license (for what I
know) that can force them to contribute to Wikidata. That is, IMHO, the
problem with "over-the-top" actors: they can harness collective intelligent
and "not give back." Even with CC-BY-SA, they could store (as they are
probably already doing) all the data in their knowledge vault, which is
secret as it is an incredible asset for them.
I'd be happy to insert a new clause of "forced transparency" in CC-BY-SA or
CC0, but it's not there.
So, as we are  working via GLAMs with Wikipedia for getting reliable
sources and content, we are working with them also for good statements and
data. Putting good data in Wikidata makes it better, and I don't understand
what is the problem here (I understand, again, the issue of putting too
much data and still having a small community).
For example: if we are importing different reliable databases, andthe
institutions behind them find it useful and helpful to have an aggregator
of identifiers and authority controls, what is the issue? There is value in
aggregating data, because you can spot errors and inconsistencies. It's not
easy, of course, to find a good workflow, but, again, that is *another*
problem.
So, in conclusion: I find many issues in Wikidata, but not on the
mission/vision, just in the complexity of the project, the size of the
dataset, the size of the community.
Can we talk about those?
Aubrey
On Sun, Dec 13, 2015 at 6:40 PM, Andreas Kolbe jayen466@gmail.com wrote:
...
On Sun, Dec 13, 2015 at 5:32 PM, geni geniice@gmail.com wrote:
...
On 13 December 2015 at 15:57, Andreas Kolbe jayen466@gmail.com
wrote:
...
...
...
Jane,
The issue is that you can't cite one Wikipedia article as a source in
another.
However you can within the same article per [[WP:LEAD]].
Well, of course, if there are reliable sources cited in the body of the
article that back up the statements made in the lead. You still need to
cite a reliable source though; that's Wikipedia 101.
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wikimedia-l] Quality issues