Liam,
I am interested in anything demonstrating that the things I am concerned about are not a problem.
Further Comments interspersed below.
On Fri, Nov 27, 2015 at 12:51 PM, Liam Wyatt liamwyatt@gmail.com wrote:
On 27 November 2015 at 12:08, Andreas Kolbe jayen466@gmail.com wrote:
The Wikimedia movement has always had an important principle: that all content should be traceable to a "reliable source". Throughout the first decade of this movement and beyond, Wikimedia content has never been considered a reliable source. For example, you can't use a Wikipedia article as a reference in another Wikipedia article.
Another important principle has been the disclaimer: pointing out to
people
that the data is anonymously crowdsourced, and that there is no guarantee of reliability or fitness for use.
Both of these principles are now being jettisoned.
Wikipedia content is considered a reliable source in Wikidata...
<snip>
I agree that "reliable source" referencing and "crowdsourced content" are indeed principles of our movement. However, I disagree that Wikidata is "jettisoning" them. In fact, quite the contrary!
The purpose of the statement "imported from --> English Wikipedia" in the "reference" field of a Wikidata item's statement is PRECISELY to indicate to the user that this information has not been INDEPENDENTLY verified to a reliable source and that Wikipedia is NOT considered a reliable source. Furthermore, it provides a PROVENANCE of that information to help stop people from circular referencing. That is - clearly stating that the specific fact in Wikidata has come from Wikipedia helps to avoid the structured-data equivalent of "citogenisis": https://xkcd.com/978/ If/When a person can provide a reliable reference for that same fact, they are encouraged to add an actual reference. Note, the wikidata statement used for facts coming in from Wikipedia use the property "imported from". This is deliberately different from the property "reference URL" which is what you would use when adding an actual reference to a third-party reliable online source.
How does the presence of that information in Wikidata help if the Google user just gets the info in the Knowledge Graph without any indication that it comes from Wikidata? Because CC0 specifically waives the right to attribution that Wikipedia retains.[1][2] No re-user of Wikidata content is required to say where the data came from, and they typically don't.
So, absent this information, don't you think it likely that users will simply propagate information they find in Google and on other reusers' sites? Rather than preventing citogenesis, I think it's citogenesis on steroids, given that Google has far more users than any Wikimedia project.
This CC0, no-attribution arrangement may financially benefit Google, because they can dispense with a source link that might lead users away from their own site and their ads, but how does it benefit the public, or indeed benefit Wikimedia? Are we all just working to make Google richer, or are we working for the public?
Moreover, according to data on Wikimedia Labs[3], about half of all statements in Wikidata have *no reference whatsoever*. That's *in addition* to the third that are only referenced to a Wikipedia.
Yet all of this material is meant to form an input to the Google Knowledge Graph, following Google's abandonment of Freebase in favour of Wikidata.[4][5]
Furthermore, the fact that many statements in Wikidata are not given a reference (yet) is not necessarily a "problem". For example - this https://www.wikidata.org/wiki/Q21481859 is a Wikidata item for a scientific publication with 2891 co-authors!! This is an extreme example, but it demonstrates my point... None of those 2891 statements has a specific reference listed for it, because all of them are self-evidently referenced to the scientific publication itself. The same is true of the other properties applied to this item (volume, publication date, title, page number...). All of these could be "referenced" to the very first property in the Wikidata item - the DOI of the scientific article: http://www.sciencedirect.com/science/article/pii/S0370269312008581 This item is not "less reliable" because it doesn't have the same footnote repeated almost three thousand times, but if you merely look at statistics of "unreferenced wikidata statements" it would APPEAR that it is very poorly cited. So, I think we need a more nuanced view of what "proper referencing" means in the context of Wikidata.
I take your point, even though I am unsure what value this Wikidata listing adds for the public, given that it merely reproduces details from the publisher's page. Might we be reinventing the wheel? And if there is value added for the public in some way that escapes me, surely it would not be difficult to have the bot add the reference automatically when importing the data from the publisher's page, thereby showing that it is referenced and making it easier to spot when someone subsequently adds the name of his classmate as a joke?
I'll add an extreme example of my own, from the opposite end of the spectrum: for five months in 2014, Wikidata told the world that Franklin D. Roosevelt was also known as "Adolf Hitler".[6]
If obvious unsourced vandalism lasts as long as that, I am not sanguine about the likelihood of more subtle distortions being spotted in a timely manner. Note that manipulation of Knowledge Graph content was reportedly a problem with Freebase as well.[4]
[1] https://creativecommons.org/publicdomain/zero/1.0/ [2] https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution... [3] https://tools.wmflabs.org/wikidata-todo/stats.php [4] https://www.seroundtable.com/google-freebase-wikidata-knowledge-graph-19591.... [5] http://searchengineland.com/google-close-freebase-helped-feed-knowledge-grap... [6] https://www.wikidata.org/w/index.php?title=Q8007&oldid=124603129 https://www.wikidata.org/w/index.php?title=Q8007&diff=next&oldid=154...