Hi Markus,
On 1 December 2015 at 23:43, Markus Krötzsch <markus at semantic-mediawiki.org> <wikidata%40lists.wikimedia.org?Subject=Re%3A%20%5BWikidata%5D%20%5BWikimedia-l%5D%20Quality%20issues&In-Reply-To=%3C565E30AB.6000709%40semantic-mediawiki.org%3E> wrote:
[I continue cross-posting for this reply, but it would make sense to return the thread to the Wikidata list where it started, so as to avoid partial discussions happening in many places.]
Apologies for the late reply.
While you indicated that you had crossposted this reply to Wikimedia-l, it didn't turn up in my inbox. I only saw it today, after Atlasowa pointed it out on the Signpost op-ed's talk page.[1]
On 27.11.2015 12:08, Andreas Kolbe wrote:
- Wikipedia content is considered a reliable source in Wikidata, and
*> >* Wikidata content is used as a reliable source by Google, where it *> >* appears without any indication of its provenance.*
This prompted me to reply. I wanted to write an email that merely says: >
"Really? Where did you get this from?" (Google using Wikidata content)
Multiple sources, including what appears to be your own research group's writing:[2]
---o0o---
In December 2013, Google announced that their own collaboratively edited knowledge base, Freebase, is to be discontinued in favour of Wikidata, which gives Wikidata a prominent role as an in[p]ut for Google Knowledge Graph. The research group Knowledge Systems https://ddll.inf.tu-dresden.de/web/Knowledge_Systems/en is working in close cooperation with the development team behind Wikidata, and provides, e.g., the regular Wikidata RDF-Exports.
---o0o---
But then I read the rest ... so here you go ...
Your email mixes up many things and effects, some of which are important issues (e.g., the fact that VIAF is not a primary data source that should be used in citations). Many other of your remarks I find very hard to take serious, including but not limited to the following:
- A rather bizarre connection between licensing models and
accountability (as if it would make content more credible if you are legally required to say that you found it on Wikipedia, or even give a list of user names and IPs who contributed)
Both Freebase and Wikipedia have attribution licences. When Bing's Snapshot displays information drawn from Freebase or Wikipedia, it's indicated thus at the bottom of the infobox[3]:
---o0o---
Data from Freebase · Wikipedia
---o0o---
I take this as a token gesture to these sources' attribution licences.
Given the amount of space they have available, I would think most people would agree that this form of attribution is sufficient. You couldn't possibly expect them to list all contributors who have ever contributed to the lead of the Wikipedia article, for example, as the letter of the licence might require.
However, I think it's proper and important that those minimal attributions are there. And given Wikidata's CC0 licence, I don't expect re-users to continue attributing in this manner. This view is shared by Max Klein for example, who is quoted to that effect in the Signpost op-ed.[4]
- Some stories that I think you really just made up for the sake of > argument (Denny alone has picked the Wikidata license?
Denny led the development team. There are multiple public instances and accounts of his having advocated this choice and convinced people of the wisdom of it, in Wikidata talk pages and elsewhere, including a recent post on the Wikidata mailing list.[5]
Interestingly, he originally said that this would mean there could be no imports from Wikipedia, and that there was in fact no intention to import data from Wikipedias (see op-ed).[6] He also said, higher up on that page, that this was "for starters", and that that decision could easily be changed later on by the community.[7]
Google displays Wikidata content?
See above. If Wikidata plays "a prominent role as an in[p]ut for Google Knowledge Graph" then I would expect there to be correspondences between Knowledge Graph and Wikidata content.
Bing is fuelled by Wikimedia?)
I spoke of "Wikimedia-fuelled search engines like Google and Bing" in the context of the Google Knowledge Graph and Bing's Snapshot/Satori equivalent.
We all know that in both cases, much of the content Google and Bing display in these infoboxes comes from Wikimedia projects (Wikipedia, Commons and now, apparently, Wikidata).
- Some disjointed remarks about the history of capitalism> * The assertion that content is worse just because the author who > created it used a bot for editing
I spoke of "bot users mass-importing unreliable data". It's not the bot method that makes the data unreliable: they are unreliable to begin with (because they are unsourced, nobody verifies the source, etc.).
As I pointed out in this week's op-ed, of the top fifteen hoaxes in the English Wikipedia, six have active Wikidata items (or rather, had: they were deleted this morning, after the op-ed appeared).
This is what I mean by unreliable data.
- The idea that engineers want to build systems with bad data because > they like the challenge of cleaning it up -- I mean: really! There is > nothing one can even say to this.
Again, this is not quite what I was trying to convey. My impression is that the current community effort at Wikidata emphasises speed: hence the mass imports of data from Wikipedia, whether verifiable or not, contrary to original intentions, as represented by Denny's quote above.
As far as I can make out, present-day thinking among many Wikidatans is: let's get lots of data in fast even though we know some of it will be bad. Afterwards, we can then apply clever methods to check for inconsistencies and clean our data up -- which is a challenge people do seem to warm to. Meanwhile, others throw up their arms in dismay and say, "Stop! You're importing bad data."
Wouldn't you agree that this characterises some of the recent discussions on the Wikidata Project Chat page?
The two camps seem approximately evenly represented in the discussions I've seen. But while the one camp says "Stop!", the other camp continues importing. So in practice, the importers are getting their way.
- The complaint that Wikimedia employs too much engineering expertise > and too little content expertise (when, in reality, it is a key > principle of Wikimedia to keep out of content, and communities regularly > complain WMF would still meddle too much).
Is it not obvious that I was talking about community practices rather than the actions of Wikimedia staff?
- All those convincing arguments you make against open, anonymous > editing because of it being easy to manipulate (I've heard this from > Wikipedia critics ten years ago; wonder what became of them)
Such criticisms are still regularly levelled at Wikipedia, in top-quality publications. If you really want, I can send you a literature list, but you could begin with this article in Newsweek.[6]
- And, finally, the culminating conspiracy theory of total control over > political opinion, destroying all plurality by allowing only one > viewpoint (not exactly what I observe on the Web ...) -- and topping > this by blaming it all on the choice of a particular Creative Commons > license for Wikidata! Really, you can't make this up.
The information provided by default to billions of search engine users *matters*. You can never prevent an individual from going to a website that espouses a different view, but you don't have to for that information to have a measurable effect.
Robert Epstein and Ronald E. Robertson recently published a paper on what they called "The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections".[9] It provides further detail.
Summing up: either this is an elaborate satire that tries to test how > serious an answer you will get on a Wikimedia list, or you should > *seriously* rethink what you wrote here, take back the things that are > obviously bogus, and have a down-to-earth discussion about the topics > you really care about (licenses and cyclic sourcing on Wikimedia > projects, I guess; "capitalist companies controlling public media" > should be discussed in another forum).
No satire was intended. I hope I have succeeded in making my points clearer.
Regards,
Andreas
[1] https://en.wikipedia.org/wiki/Wikipedia_talk:Wikipedia_Signpost/2015-12-02/O... [2] https://ddll.inf.tu-dresden.de/web/Wikidata/en [3] http://www.bing.com/search?q=jerusalem&go=Submit&qs=n&form=QBLH&... [4] https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed [5] https://lists.wikimedia.org/pipermail/wikidata/2015-December/007769.html [6] https://archive.is/ZbV5A#selection-2997.0-3009.26 [7] https://archive.is/ZbV5A#selection-2755.308-2763.27 [8] http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogus-busi... [9] http://www.pnas.org/content/112/33/E4512.abstract