On Sun, Dec 13, 2015 at 6:10 PM, Andrea Zanni <zanni.andrea84(a)gmail.com>
I really feel we are drowning in a glass of water.
The issue of "data quality" or "reliability" that Andreas raises is
what I don't understand if the "scale" of it is much bigger on Wikidata
and if this different scale makes it much more important. The scale of the
issue is maybe something worth discussing, and not the issue itself? Is the
fact that Wikidata is centralised different from statements on Wikipedia? I
don't know, but to me this is a more neutral and interesting question.
Wikidata's (envisaged) centralised nature certainly makes a difference,
because the promise was that it would inform the Wikipedias.
Wikipedia started out with people just writing from their personal
knowledge. The early articles had no footnotes. Then after a while people
noticed problems like cranks filling pages with their abstruse theories
(hence the ban on original research), people adding material from their
blogs, etc. Over the course of a decade, Wikipedia developed the idea and
the culture that you have to cite a professionally published source for
everything you add to Wikipedia.
Wikidata is in its early stages. In a way it really is like Wikipedia in
2003. New content welcome! No references required!
But at the same time, Wikidata is supposed to inform the Wikipedias, as a
central data repository. This creates a mismatch between Wikidata's "early
days -- anything goes, let's just get content in, we'll sort it out later"
attitude and the relatively mature Wikipedias where editors insist on
sources for any new content added.
This out-of-synch-ness is a real problem if you want Wikipedias to actually
use Wikidata content. Wikipedians will not accept content generation models
that take Wikipedia back to its bad old days where you could write anything
you liked without a source to back it up.
Wikipedia is of course still a long way away from citing such sources for
all its content. There are vast amounts of legacy material left over from
the early days. But in the pages that are being created now (like
developing news stories, an area where the quality of Wikipedia's coverage
is often praised), pages that see a lot of traffic, pages that are
controversial, etc., it is well established that you have to cite sources
for any new assertions.
Unsourced content is unceremoniously deleted.
If Wikipedia's reputation for reliability has improved since 2003, that
change in culture from the early days is the reason.
The Age for example published an article the other day that is probably one
of the most celebratory articles ever written about Wikipedia. If you're
a Wikipedian, you'll probably enjoy reading it.
Among the aspects that the author, Elizabeth Farrelly, said she liked most
about Wikipedia was "its ruthless commitment to the printed, demonstrable
source." She ended the article as follows:
But most interesting to me is the ban on primary research. The demand that
every input be traced to a published and authoritative source doesn't make
it true, necessarily, but does enable genuine crowd-sourcing of
scholarship. This is a revelation, and a revolution.
So yes, Wikipedia is flawed. Above all, it needs more female input. But the
obvious response, for you-and-me users who encounter something stupid or
biased or just plain wrong, is to hop in there and fix it. I'll see you
there, yes? Oh, and honey? Cite away!
Abandoning the principles that have elicited such praise -- traceability to
published sources, verifiable citations -- is not something Wikipedians
will entertain. To them, it would be a step back. If Wikidata wants to be
an input to Wikipedia, it will have to bear that in mind.
I often say that the Wikimedia world made quality an
feature: you always have to check if it's there.
The point is: it's been always like this.
We always had to check for quality, even when we used Britannica or
authority controls or whatever "reliable" sources we wanted. Wikipedia, and
now Wikidata, is made for everyone to contribute, it's open and honest in
being open, vulnerable, prone to errors. But we are transparent, we say
that in advance, we can claim any statement to the smallest detail. Of
course it's difficult, but we can do it. Wikidata, as Lydia said, can
actually have conflicting statements in every item: we "just" have to put
them there, as we did to Wikipedia.
If Google uses our data and they are wrong, that's bad for them. If they
correct the errors and do not give us the corrections, that's bad for us
and not ethical from them. The point is: there is no license (for what I
know) that can force them to contribute to Wikidata. That is, IMHO, the
problem with "over-the-top" actors: they can harness collective intelligent
and "not give back." Even with CC-BY-SA, they could store (as they are
probably already doing) all the data in their knowledge vault, which is
secret as it is an incredible asset for them.
I'd be happy to insert a new clause of "forced transparency" in CC-BY-SA
CC0, but it's not there.
So, as we are working via GLAMs with Wikipedia for getting reliable
sources and content, we are working with them also for good statements and
data. Putting good data in Wikidata makes it better, and I don't understand
what is the problem here (I understand, again, the issue of putting too
much data and still having a small community).
For example: if we are importing different reliable databases, andthe
institutions behind them find it useful and helpful to have an aggregator
of identifiers and authority controls, what is the issue? There is value in
aggregating data, because you can spot errors and inconsistencies. It's not
easy, of course, to find a good workflow, but, again, that is *another*
So, in conclusion: I find many issues in Wikidata, but not on the
mission/vision, just in the complexity of the project, the size of the
dataset, the size of the community.
Can we talk about those?
On Sun, Dec 13, 2015 at 6:40 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:
On Sun, Dec 13, 2015 at 5:32 PM, geni
> On 13 December 2015 at 15:57, Andreas Kolbe <jayen466(a)gmail.com>
The issue is that you can't cite one Wikipedia article as a source in
However you can within the same article per [[WP:LEAD]].
Well, of course, if there are reliable sources cited in the body of the
article that back up the statements made in the lead. You still need to
cite a reliable source though; that's Wikipedia 101.
Wikimedia-l mailing list, guidelines at:
Wikimedia-l mailing list, guidelines at: