On Wed, Apr 11, 2012 at 02:08, Dario Taraborelli
<dtaraborelli(a)wikimedia.org> wrote:
I very much look forward to a reply by the Wikidata
team and hope the Atlantic will host it.
file it under "been there, done that". Denny from Wikidata has written
a verbose reply right under the article. Just look in the
comments-section.
Mathias
Mark,
thank you for your well-thought criticism. When we were thinking first
of adding structured data to Wikipedia, we were indeed thinking of
giving every language edition its own data space. This way the Arab
and the Hebrew Wikipedia community would not interfere with each
other, nor would the Estonian and the Russian communities interfere
with each other. Actually, they wouldn't even interact with each
other. They could happily build their niches and purport their own
points of view of the world, and then they would come together in the
English Wikipedia, where they would be forced either to abstain from
the conversation or to find a common ground and compromise. This would
not necessarily translate back in the language editions - they could
remain in their carefully crafted filter bubbles. Readers not able or
willing to read different languages on an article where they are not
even aware of the controversies would return from Wikipedia with the
satisfying feeling that they learned something about the world, and
would shake their heads about the ignorant inhabitants of the
neighbouring country who believe some obvious misconception about the
issue.
We still opted for having one common data space for all language
editions. Does this mean we expect the whole world to agree on one
common set of true facts, saved and redistributed in Wikidata, the
perfect form of Wikiality, and everything else will be considered
falsehood and lies? Not in the least.
First, Wikidata will not be about The Truth. I expect the Wikidata
community to follow the spirit of the Wikipedia community, and require
citations and references for the data. We do not expect the editors to
agree on the population of Israel, but we do expect them to agree on
what specific sources claim aboiut the population of Israel. They will
be able to gather several sources with their sometimes contradicting
data. So we might have the population according to the Israeli
statistics office, according to the Egyptian staistics office,
according to the CIA World Fact book, and according to even more
sources. Instead of hiding these differences in their respective
language editions, we can have one space to gather them all and
display them side by side, making the disagreement explicit and
visible.
Second, Wikidata will not force anything into the Wikipedias. For
every step of the different possible ways the data can flow from
Wikidata to the Wikipedias, there will be ways to opt out for every
language edition. The language editions can choose to give preference
to certain sources. The language editions can opt out to use Wikidata
for a specific value, and replace it with a locally agreed fact. The
language editions can even ignore Wikidata entirely and just continue
as they had the last decade. Wikidata is an offer, and not a mandate.
Third, Wikidata will have a different coverage than Wikipedia. A lot
of issues that you mentioned are far too nuanced to be expressed in
Wikidata. Let us take the example of the Bronze Soldier of Tallinn
that you mentioned: whereas a text, featuring an intepretation of the
symbolism of the statue can lead to controversy and discussion, what
points of data about it would be? The material? The height? The date
of erection? Its current geolocation? None of these statements are
disputed, and they could be used in the Estonian, Russian, and English
version alike. What about your second example, the population of
Israel? Does it include Gaza or not? Well, this kind of information
can be made explicit in Wikidata. Our knowledge model will enable the
editors to state "The population of Israel in 2012, excluding Gaza,
was X, according to the following sources". I think that once you
consider the limits of what can be stated in Wikidata, and the
importance I expect to be given to properly referencing the sources,
the number of expected controversies will be much smaller than many
expect now.
Fourth, you rightfully point out that the Wikipedias today are mostly
written by a specific contributor demographics. This is true, but it
glances over the fact that it used to be even more specific. With the
growth of Wikipedia the contributor demographics have expanded and
diversified - not yet as much as one might hope, but it is getting
better. One of your points raised was that Wikipedia has not many
contributors in Africa. We actually hope that Wikidata will improve
this situation: since all languages will work on the same data space,
contributions from Africa and from Europe will live side by side, and
the motivation for contributing to a common space that everyone will
benefit from - and not just the much smaller language community one
belongs to - might increase the number of contributions coming from
regions underrepresented today (compare this to the situation in
countries like Uzbekistan, where a language like Russian binds a lot
of the attention and possible contributions to the bigger and more
succesful Wikipedia language edition).
Fifth, in your criticism you implicate the idea that languages are
good and valid borders for keeping knowledge diversity alive. If this
was true, how comes that English language articles, where communities
otherwise separated by language often come together and create article
of higher quality and reflecting a richer diversity than the
individual language articles? My own experiences are rooted in the
Croatian, Serbian, Bosnian, etc. Wikipedias, all language editions of
their own. The richness of diversity that the English Wikipedia
article show on topics of the Yugoslav wars is not matched by any of
the native language editions.
What is particularly interesting about your criticism is that Wikidata
was developed with support from the EU research project RENDER, which
has its main concern about knowledge diversity. We had discussions
about some of our research results in the past, especially the
Wikipedia map, not so unsimilar to some of your own results. In RENDER
we developed the requirements for a data model that is centred on the
ideas of being a possibly inconsistent, secondary data source, not
being about The Truth.
Whereas I understand your concern from an abstract view on the issue,
I challenge you to point to the actual articles that you fear will get
poorer in their diversity once Wikidata will be operational. You cite
your own and your colleagues research on this issue, so I assume your
concerns are based on real use cases.
I am sorry for this long answer, but since I consider your concerns
would be very valid if Wikidata was done in a more naive way, and
since I understand that many people will think that Wikidata is being
developed in such a naive way, I took the liberty to expand more on
our current thinking of how Wikidata could work, and some of the
design decisions in building Wikidata.
Thank you for this opportunity!
Denny Vrandecic, project director Wikidata