Hoi, "He who is without sin, throws the first stone". I read this article [1] in Wired and it seems to me that Wikipedians, English Wikipedians at that have plenty to do to get their own house in order. The topic was quality particularly in Wikidata and it degenerated in a conversation that included the Kazhak Wikipedia, the potential to manipulate information and whatever.
I am happy to say that quality is an issue. It is an issue for all of us. However, I am firmly with Jane that once we have identified issues, we should either come up with ways to make them manageable and/or identifiable. The confrontation of 'sources or die' is easy" DIE. That is not to say that sources are important but they hide too much and they too are often and easily manipulated.
When quality is at issue, concentrate on that subject and for a moment forget about secundary or tertiary caveats. If we can agree that our own efforts, positively applied, will help us improve quality, we have a way forward. There are micro and macro ways of improving quality. I give an example of both.
Psychiatry and stigma are subjects woefully underdeveloped. I have added one person and connected her to two award, a book, a few organisations, people teaching at the University of Maastricht and several other people occupied in this field. I asked her for additional information to expand the field. This is a micro contribution and because of the links it has quality.
A German University is interested to use Wikidata and wants to connect its content to our content. They are happy to share their data and it is important to them when their data is sourced to them. We are talking and it may become a reality.
These are two ways of improving quality, one of them is explicitly about sourcing. To me it is less in them being a source as them including their reputation at the same time. The info I added about "ervaringsdeskundigheid" is likely to be kept because it is well connected and at some choice points sources are all too easy to include. Another reason why it will stay is that my reputation is such that it is more than likely correct. Even that is not so much of a concern because as more data becomes available in Wikidata possible errors will be found and corrected. (there are none as far as I am aware).
The point of this all? Quality is a goal, it is something that you achieve by hard work. Wikipedia is a quality resource and it does have rough edges. Wikidata is immature, underdeveloped and in need of all the love and care it can get. Yes, there are secondary and tertiary concerns. But they should not remove our attention of what is our main concern; the improved quality that we can achieve only when we collaborate. At that Wikidata has plenty to offer to Wikipedia already. In my opinion the easiest results are not so much in the info boxes but more in revitalising the red links and removing the many many links that are plain wrong. Thanks, GerardM
[1] http://arstechnica.com/staff/2015/12/editorial-wikipedia-fails-as-an-encyclo...
On 8 December 2015 at 00:02, Andreas Kolbe jayen466@gmail.com wrote:
Hi Markus,
On 1 December 2015 at 23:43, Markus Krötzsch <markus at semantic-mediawiki.org> <wikidata% 40lists.wikimedia.org?Subject=Re%3A%20%5BWikidata%5D%20%5BWikimedia-l%5D%20Quality%20issues&In-Reply-To=%3C565E30AB.6000709%40semantic-mediawiki.org%3E
wrote:
[I continue cross-posting for this reply, but it would make sense to return the thread to the Wikidata list where it started, so as to avoid partial discussions happening in many places.]
Apologies for the late reply.
While you indicated that you had crossposted this reply to Wikimedia-l, it didn't turn up in my inbox. I only saw it today, after Atlasowa pointed it out on the Signpost op-ed's talk page.[1]
On 27.11.2015 12:08, Andreas Kolbe wrote:
- Wikipedia content is considered a reliable source in Wikidata, and
*> >* Wikidata content is used as a reliable source by Google, where it *> >* appears without any indication of its provenance.*
This prompted me to reply. I wanted to write an email that merely says: >
"Really? Where did you get this from?" (Google using Wikidata content)
Multiple sources, including what appears to be your own research group's writing:[2]
---o0o---
In December 2013, Google announced that their own collaboratively edited knowledge base, Freebase, is to be discontinued in favour of Wikidata, which gives Wikidata a prominent role as an in[p]ut for Google Knowledge Graph. The research group Knowledge Systems https://ddll.inf.tu-dresden.de/web/Knowledge_Systems/en is working in close cooperation with the development team behind Wikidata, and provides, e.g., the regular Wikidata RDF-Exports.
---o0o---
But then I read the rest ... so here you go ...
Your email mixes up many things and effects, some of which are important issues (e.g., the fact that VIAF is not a primary data source that should be used in citations). Many other of your remarks I find very hard to take serious, including but not limited to the following:
- A rather bizarre connection between licensing models and
accountability (as if it would make content more credible if you are legally required to say that you found it on Wikipedia, or even give a list of user names and IPs who contributed)
Both Freebase and Wikipedia have attribution licences. When Bing's Snapshot displays information drawn from Freebase or Wikipedia, it's indicated thus at the bottom of the infobox[3]:
---o0o---
Data from Freebase · Wikipedia
---o0o---
I take this as a token gesture to these sources' attribution licences.
Given the amount of space they have available, I would think most people would agree that this form of attribution is sufficient. You couldn't possibly expect them to list all contributors who have ever contributed to the lead of the Wikipedia article, for example, as the letter of the licence might require.
However, I think it's proper and important that those minimal attributions are there. And given Wikidata's CC0 licence, I don't expect re-users to continue attributing in this manner. This view is shared by Max Klein for example, who is quoted to that effect in the Signpost op-ed.[4]
- Some stories that I think you really just made up for the sake of >
argument (Denny alone has picked the Wikidata license?
Denny led the development team. There are multiple public instances and accounts of his having advocated this choice and convinced people of the wisdom of it, in Wikidata talk pages and elsewhere, including a recent post on the Wikidata mailing list.[5]
Interestingly, he originally said that this would mean there could be no imports from Wikipedia, and that there was in fact no intention to import data from Wikipedias (see op-ed).[6] He also said, higher up on that page, that this was "for starters", and that that decision could easily be changed later on by the community.[7]
Google displays Wikidata content?
See above. If Wikidata plays "a prominent role as an in[p]ut for Google Knowledge Graph" then I would expect there to be correspondences between Knowledge Graph and Wikidata content.
Bing is fuelled by Wikimedia?)
I spoke of "Wikimedia-fuelled search engines like Google and Bing" in the context of the Google Knowledge Graph and Bing's Snapshot/Satori equivalent.
We all know that in both cases, much of the content Google and Bing display in these infoboxes comes from Wikimedia projects (Wikipedia, Commons and now, apparently, Wikidata).
- Some disjointed remarks about the history of capitalism> * The
assertion that content is worse just because the author who > created it used a bot for editing
I spoke of "bot users mass-importing unreliable data". It's not the bot method that makes the data unreliable: they are unreliable to begin with (because they are unsourced, nobody verifies the source, etc.).
As I pointed out in this week's op-ed, of the top fifteen hoaxes in the English Wikipedia, six have active Wikidata items (or rather, had: they were deleted this morning, after the op-ed appeared).
This is what I mean by unreliable data.
- The idea that engineers want to build systems with bad data because >
they like the challenge of cleaning it up -- I mean: really! There is > nothing one can even say to this.
Again, this is not quite what I was trying to convey. My impression is that the current community effort at Wikidata emphasises speed: hence the mass imports of data from Wikipedia, whether verifiable or not, contrary to original intentions, as represented by Denny's quote above.
As far as I can make out, present-day thinking among many Wikidatans is: let's get lots of data in fast even though we know some of it will be bad. Afterwards, we can then apply clever methods to check for inconsistencies and clean our data up -- which is a challenge people do seem to warm to. Meanwhile, others throw up their arms in dismay and say, "Stop! You're importing bad data."
Wouldn't you agree that this characterises some of the recent discussions on the Wikidata Project Chat page?
The two camps seem approximately evenly represented in the discussions I've seen. But while the one camp says "Stop!", the other camp continues importing. So in practice, the importers are getting their way.
- The complaint that Wikimedia employs too much engineering expertise >
and too little content expertise (when, in reality, it is a key > principle of Wikimedia to keep out of content, and communities regularly > complain WMF would still meddle too much).
Is it not obvious that I was talking about community practices rather than the actions of Wikimedia staff?
- All those convincing arguments you make against open, anonymous >
editing because of it being easy to manipulate (I've heard this from > Wikipedia critics ten years ago; wonder what became of them)
Such criticisms are still regularly levelled at Wikipedia, in top-quality publications. If you really want, I can send you a literature list, but you could begin with this article in Newsweek.[6]
- And, finally, the culminating conspiracy theory of total control over
political opinion, destroying all plurality by allowing only one >
viewpoint (not exactly what I observe on the Web ...) -- and topping > this by blaming it all on the choice of a particular Creative Commons > license for Wikidata! Really, you can't make this up.
The information provided by default to billions of search engine users *matters*. You can never prevent an individual from going to a website that espouses a different view, but you don't have to for that information to have a measurable effect.
Robert Epstein and Ronald E. Robertson recently published a paper on what they called "The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections".[9] It provides further detail.
Summing up: either this is an elaborate satire that tries to test how >
serious an answer you will get on a Wikimedia list, or you should > *seriously* rethink what you wrote here, take back the things that are > obviously bogus, and have a down-to-earth discussion about the topics > you really care about (licenses and cyclic sourcing on Wikimedia > projects, I guess; "capitalist companies controlling public media" > should be discussed in another forum).
No satire was intended. I hope I have succeeded in making my points clearer.
Regards,
Andreas
[1]
https://en.wikipedia.org/wiki/Wikipedia_talk:Wikipedia_Signpost/2015-12-02/O... [2] https://ddll.inf.tu-dresden.de/web/Wikidata/en [3]
http://www.bing.com/search?q=jerusalem&go=Submit&qs=n&form=QBLH&... [4] https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed [5] https://lists.wikimedia.org/pipermail/wikidata/2015-December/007769.html [6] https://archive.is/ZbV5A#selection-2997.0-3009.26 [7] https://archive.is/ZbV5A#selection-2755.308-2763.27 [8]
http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogus-busi... [9] http://www.pnas.org/content/112/33/E4512.abstract _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe