On Mon, Nov 23, 2015 at 1:15 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:
Moreover, I was somewhat surprised to learn the other
day that, apparently,
over 80 percent of Wikidata statements are either unreferenced or only
referenced to a Wikipedia:
That seems like a recipe for disaster, given that Wikidata feeds the Google
Knowledge Graph and Bing Satori to some extent.
Here are my thoughts:
1) No, it's not a recipe for disaster. :-) I expand below.
2) People sit at the different parts of the spectrum when it comes to the
issues around Wikidata references. What almost all these people have in
common is that they know having references is a very valuable thing for
Wikidata (or any other knowledge base for that matter).
3) As a researcher, as long as the data is in Wikidata, with or without a
reference, I'm already some steps ahead. If there is no reference, I have a
starting point to look for a reference for that specific value, and in that
process, I may find conflicting data with new references. For a project in
a growing stage, these are opportunities, not blockers.
4) I hear a lot of sensitivity about referencing Wikidata claim values to
Wikipedia. I hear people's concerns (having loops in referencing mechanisms
is not good) but I do not consider the existence of Wikipedia references an
issue and I certainly prefer a Wikipedia reference over no reference,
especially if the date the information was extracted at is also tracked
somewhere in Wikidata. Giving information to the researcher that the data
has come from Wikipedia will give him/her a head-start about where to
continue the search.
5) I see a need to give the users of open data a chance to use data with
more knowledge and control. For example, if you are an app developer, you
should be able to figure out relatively easily what data in Wikidata you
can fully trust, and what data you may want to skip using in your app. At
the moment, some part of the community considers a value with a non-
Wikipedia reference approved/monitored by a human as trustworthy (this is
no written rule, I'm summarizing my current understanding based on
discussions with some of the Wikidata community members, including myself
:-). But, among other things, the reference in Wikidata may not be a
trustworthy reference. We should surface how much trust one should have in
the values in Wikidata to the end-user.
What is amazing is: There are many great things one can do based on the
data that is being gathered in Wikidata. We should all work together to
improve that data, but we should also acknowledge that our attention is
split across many projects (this is definitely the case for me), and as a
result, we will be seeing steady and smooth improvements in Wikidata, and
not sudden and very fast improvements. We need to stay curious, excited,
committed, and patient. :-)
Disclaimer: These are my personal views about references in Wikidata, and
not necessarily the views of my team or the Wikimedia Foundation. :-)