Liam,
I am interested in anything demonstrating that the things I am concerned
about are not a problem.
Further Comments interspersed below.
On Fri, Nov 27, 2015 at 12:51 PM, Liam Wyatt <liamwyatt(a)gmail.com> wrote:
On 27 November 2015 at 12:08, Andreas Kolbe
<jayen466(a)gmail.com> wrote:
The Wikimedia movement has always had an
important principle: that all
content should be traceable to a "reliable source". Throughout the first
decade of this movement and beyond, Wikimedia content has never been
considered a reliable source. For example, you can't use a Wikipedia
article as a reference in another Wikipedia article.
Another important principle has been the disclaimer: pointing out to
people
that the data is anonymously crowdsourced, and
that there is no guarantee
of reliability or fitness for use.
Both of these principles are now being jettisoned.
Wikipedia content is considered a reliable source in Wikidata...
<snip>
I agree that "reliable source" referencing and "crowdsourced content"
are
indeed principles of our movement. However, I disagree that Wikidata is
"jettisoning" them. In fact, quite the contrary!
The purpose of the statement "imported from --> English Wikipedia" in the
"reference" field of a Wikidata item's statement is PRECISELY to indicate
to the user that this information has not been INDEPENDENTLY verified to a
reliable source and that Wikipedia is NOT considered a reliable source.
Furthermore, it provides a PROVENANCE of that information to help stop
people from circular referencing. That is - clearly stating that the
specific fact in Wikidata has come from Wikipedia helps to avoid the
structured-data equivalent of "citogenisis":
https://xkcd.com/978/ If/When
a person can provide a reliable reference for that same fact, they are
encouraged to add an actual reference. Note, the wikidata statement used
for facts coming in from Wikipedia use the property "imported from". This
is deliberately different from the property "reference URL" which is what
you would use when adding an actual reference to a third-party reliable
online source.
How does the presence of that information in Wikidata help if the Google
user just gets the info in the Knowledge Graph without any indication that
it comes from Wikidata? Because CC0 specifically waives the right to
attribution that Wikipedia retains.[1][2] No re-user of Wikidata content is
required to say where the data came from, and they typically don't.
So, absent this information, don't you think it likely that users will
simply propagate information they find in Google and on other reusers'
sites? Rather than preventing citogenesis, I think it's citogenesis on
steroids, given that Google has far more users than any Wikimedia project.
This CC0, no-attribution arrangement may financially benefit Google,
because they can dispense with a source link that might lead users away
from their own site and their ads, but how does it benefit the public, or
indeed benefit Wikimedia? Are we all just working to make Google richer, or
are we working for the public?
Moreover, according to data on Wikimedia Labs[3], about half of all
statements in Wikidata have *no reference whatsoever*. That's *in addition*
to the third that are only referenced to a Wikipedia.
Yet all of this material is meant to form an input to the Google Knowledge
Graph, following Google's abandonment of Freebase in favour of
Wikidata.[4][5]
Furthermore, the fact that many statements in Wikidata
are not given a
reference (yet) is not necessarily a "problem". For example - this
https://www.wikidata.org/wiki/Q21481859 is a Wikidata item for a
scientific
publication with 2891 co-authors!! This is an extreme example, but it
demonstrates my point... None of those 2891 statements has a specific
reference listed for it, because all of them are self-evidently referenced
to the scientific publication itself. The same is true of the other
properties applied to this item (volume, publication date, title, page
number...). All of these could be "referenced" to the very first property
in the Wikidata item - the DOI of the scientific article:
http://www.sciencedirect.com/science/article/pii/S0370269312008581 This
item is not "less reliable" because it doesn't have the same footnote
repeated almost three thousand times, but if you merely look at statistics
of "unreferenced wikidata statements" it would APPEAR that it is very
poorly cited.
So, I think we need a more nuanced view of what "proper referencing" means
in the context of Wikidata.
I take your point, even though I am unsure what value this Wikidata listing
adds for the public, given that it merely reproduces details from the
publisher's page. Might we be reinventing the wheel? And if there is value
added for the public in some way that escapes me, surely it would not be
difficult to have the bot add the reference automatically when importing
the data from the publisher's page, thereby showing that it is referenced
and making it easier to spot when someone subsequently adds the name of his
classmate as a joke?
I'll add an extreme example of my own, from the opposite end of the
spectrum: for five months in 2014, Wikidata told the world that Franklin D.
Roosevelt was also known as "Adolf Hitler".[6]
If obvious unsourced vandalism lasts as long as that, I am not sanguine
about the likelihood of more subtle distortions being spotted in a timely
manner. Note that manipulation of Knowledge Graph content was reportedly a
problem with Freebase as well.[4]
[1]
https://creativecommons.org/publicdomain/zero/1.0/
[2]
https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attributio…
[3]
https://tools.wmflabs.org/wikidata-todo/stats.php
[4]
https://www.seroundtable.com/google-freebase-wikidata-knowledge-graph-19591…
[5]
http://searchengineland.com/google-close-freebase-helped-feed-knowledge-gra…
[6]
https://www.wikidata.org/w/index.php?title=Q8007&oldid=124603129
https://www.wikidata.org/w/index.php?title=Q8007&diff=next&oldid=15…