Re: [Wikimedia-l] Quality issues

30 Dec 2015

Hoi,
"He who is without sin, throws the first stone". I read this article [1] in
Wired and it seems to me that Wikipedians, English Wikipedians at that have
plenty to do to get their own house in order. The topic was quality
particularly in Wikidata and it degenerated in a conversation that included
the Kazhak Wikipedia, the potential to manipulate information and whatever.

I am happy to say that quality is an issue. It is an issue for all of us.
However, I am firmly with Jane that once we have identified issues, we
should either come up with ways to make them manageable and/or
identifiable. The confrontation of 'sources or die' is easy" DIE. That is
not to say that sources are important but they hide too much and they too
are often and easily manipulated.

When quality is at issue, concentrate on that subject and for a moment
forget about secundary or tertiary caveats. If we can agree that our own
efforts, positively applied, will help us improve quality, we have a way
forward. There are micro and macro ways of improving quality. I give an
example of both.

Psychiatry and stigma are subjects woefully underdeveloped. I have added
one person and connected her to two award, a book, a few organisations,
people teaching at the University of Maastricht and several other people
occupied in this field. I asked her for additional information to expand
the field. This is a micro contribution and because of the links it has
quality.

A German University is interested to use Wikidata and wants to connect its
content to our content. They are happy to share their data and it is
important to them when their data is sourced to them. We are talking and it
may become a reality.

These are two ways of improving quality, one of them is explicitly about
sourcing. To me it is less in them being a source as them including their
reputation at the same time. The info I added about
"ervaringsdeskundigheid" is likely to be kept because it is well connected
and at some choice points sources are all too easy to include. Another
reason why it will stay is that my reputation is such that it is more than
likely correct. Even that is not so much of a concern because as more data
becomes available in Wikidata possible errors will be found and corrected.
(there are none as far as I am aware).

The point of this all? Quality is a goal, it is something that you achieve
by hard work. Wikipedia is a quality resource and it does have rough edges.
Wikidata is immature, underdeveloped and in need of all the love and care
it can get. Yes, there are secondary and tertiary concerns. But they should
not remove our attention of what is our main concern; the improved quality
that we can achieve only when we collaborate. At that Wikidata has plenty
to offer to Wikipedia already. In my opinion the easiest results are not so
much in the info boxes but more in revitalising the red links and removing
the many many links that are plain wrong.
Thanks,
        GerardM

[1]
http://arstechnica.com/staff/2015/12/editorial-wikipedia-fails-as-an-encycl…

On 8 December 2015 at 00:02, Andreas Kolbe &lt;jayen466(a)gmail.com&gt; wrote:

...
  Hi Markus,

 On 1 December 2015 at 23:43, Markus Krötzsch <markus at
 semantic-mediawiki.org>
 <wikidata%

40lists.wikimedia.org?Subject=Re%3A%20%5BWikidata%5D%20%5BWikimedia-l%5D%20Quality%20issues&In-Reply-To=%3C565E30AB.6000709%40semantic-mediawiki.org%3E
   wrote:

  [I continue cross-posting for this reply, but it
would make sense to
 return the thread to the Wikidata list where it started, so as to avoid
 partial discussions happening in many places.] 

 Apologies for the late reply.

 While you indicated that you had crossposted this reply to
 Wikimedia-l, it didn't turn up in my inbox. I only saw it today, after
 Atlasowa pointed it out on the Signpost op-ed's talk page.[1]

  On 27.11.2015 12:08, Andreas Kolbe wrote: 
  >* Wikipedia content is considered a reliable
source in Wikidata, and  *> >* Wikidata content is used as a reliable source
by Google, where it
 *> >* appears without any indication of its provenance.*

 > This prompted me to reply. I wanted to write an email that merely says:   "Really? Where did you get this
from?" (Google using Wikidata content)

 Multiple sources, including what appears to be your own research
 group's writing:[2]

 ---o0o---

 In December 2013, Google announced that their own collaboratively
 edited knowledge base, Freebase, is to be discontinued in favour of
 Wikidata, which gives Wikidata a prominent role as an in[p]ut for
 Google Knowledge Graph. The research group Knowledge Systems
 <https://ddll.inf.tu-dresden.de/web/Knowledge_Systems/en> is working
 in close cooperation with the development team behind Wikidata, and
 provides, e.g., the regular Wikidata RDF-Exports.

 ---o0o---

  But then I read the rest ... so here you go ...

  Your email mixes up many things and effects, some
of which are important
 issues (e.g., the fact that VIAF is not a primary data source that
 should be used in citations). Many other of your remarks I find very
 hard to take serious, including but not limited to the following: 
  * A rather bizarre connection between licensing
models and
 accountability (as if it would make content more credible if you are
 legally required to say that you found it on Wikipedia, or even give a
 list of user names and IPs who contributed) 

 Both Freebase and Wikipedia have attribution licences. When Bing's
 Snapshot displays information drawn from Freebase or Wikipedia, it's
 indicated thus at the bottom of the infobox[3]:

 ---o0o---

 Data from Freebase · Wikipedia

 ---o0o---

 I take this as a token gesture to these sources' attribution licences.

 Given the amount of space they have available, I would think most
 people would agree that this form of attribution is sufficient. You
 couldn't possibly expect them to list all contributors who have ever
 contributed to the lead of the Wikipedia article, for example, as the
 letter of the licence might require.

 However, I think it's proper and important that those minimal
 attributions are there. And given Wikidata's CC0 licence, I don't
 expect re-users to continue attributing in this manner. This view is
 shared by Max Klein for example, who is quoted to that effect in the
 Signpost op-ed.[4]

 > * Some stories that I think you really just made up for the sake of   argument (Denny alone has picked the Wikidata
license?

 Denny led the development team. There are multiple public instances
 and accounts of his having advocated this choice and convinced people
 of the wisdom of it, in Wikidata talk pages and elsewhere, including a
 recent post on the Wikidata mailing list.[5]

 Interestingly, he originally said that this would mean there could be
 no imports from Wikipedia, and that there was in fact no intention to
 import data from Wikipedias (see op-ed).[6] He also said, higher up on
 that page, that this was "for starters", and that that decision could
 easily be changed later on by the community.[7]

  Google displays Wikidata content? 

 See above. If Wikidata plays "a prominent role as an in[p]ut for
 Google Knowledge Graph" then I would expect there to be
 correspondences between Knowledge Graph and Wikidata content.

  Bing is fuelled by Wikimedia?) 

 I spoke of "Wikimedia-fuelled search engines like Google and Bing" in
 the context of the Google Knowledge Graph and Bing's Snapshot/Satori
 equivalent.

 We all know that in both cases, much of the content Google and Bing
 display in these infoboxes comes from Wikimedia projects (Wikipedia,
 Commons and now, apparently, Wikidata).

  * Some disjointed remarks about the history of
capitalism> * The  assertion that content is worse just because the author who
> created it
 used a bot for editing

 I spoke of "bot users mass-importing unreliable data". It's not the
 bot method that makes the data unreliable: they are unreliable to
 begin with (because they are unsourced, nobody verifies the source,
 etc.).

 As I pointed out in this week's op-ed, of the top fifteen hoaxes in
 the English Wikipedia, six have active Wikidata items (or rather, had:
 they were deleted this morning, after the op-ed appeared).

 This is what I mean by unreliable data.

 > * The idea that engineers want to build systems with bad data because   they like the challenge of cleaning it up -- I
mean: really! There is   nothing one can even
say to this.

 Again, this is not quite what I was trying to convey. My impression is
 that the current community effort at Wikidata emphasises speed: hence
 the mass imports of data from Wikipedia, whether verifiable or not,
 contrary to original intentions, as represented by Denny's quote
 above.

 As far as I can make out, present-day thinking among many Wikidatans
 is: let's get lots of data in fast even though we know some of it will
 be bad. Afterwards, we can then apply clever methods to check for
 inconsistencies and clean our data up -- which is a challenge people
 do seem to warm to. Meanwhile, others throw up their arms in dismay
 and say, "Stop! You're importing bad data."

 Wouldn't you agree that this characterises some of the recent
 discussions on the Wikidata Project Chat page?

 The two camps seem approximately evenly represented in the discussions
 I've seen. But while the one camp says "Stop!", the other camp
 continues importing. So in practice, the importers are getting their
 way.

 > * The complaint that Wikimedia employs too much engineering expertise   and too little content expertise (when, in
reality, it is a key > principle
 of Wikimedia to keep out of content, and communities regularly > complain
 WMF would still meddle too much).

 Is it not obvious that I was talking about community practices rather
 than the actions of Wikimedia staff?

 > * All those convincing arguments you make against open, anonymous   editing because of it being easy to manipulate
(I've heard this from   Wikipedia critics
ten years ago; wonder what became of them)

 Such criticisms are still regularly levelled at Wikipedia, in
 top-quality publications. If you really want, I can send you a
 literature list, but you could begin with this article in Newsweek.[6]

 > * And, finally, the culminating conspiracy theory of total control over
 > political opinion, destroying all plurality by allowing only one   viewpoint (not exactly what I observe on the
Web ...) -- and topping > this
 by blaming it all on the choice of a particular Creative Commons > license
 for Wikidata! Really, you can't make this up.

 The information provided by default to billions of search engine users
 *matters*. You can never prevent an individual from going to a website
 that espouses a different view, but you don't have to for that
 information to have a measurable effect.

 Robert Epstein and Ronald E. Robertson recently published a paper on
 what they called "The search engine manipulation effect (SEME) and its
 possible impact on the outcomes of elections".[9] It provides further
 detail.

 > Summing up: either this is an elaborate satire that tries to test how   serious an answer you will get on a Wikimedia
list, or you should   *seriously* rethink
what you wrote here, take back the things that are 
 obviously bogus, and have a down-to-earth discussion about the topics > you
 really care about (licenses and cyclic sourcing on Wikimedia > projects, I
 guess; "capitalist companies controlling public media" > should be
 discussed in another forum).

 No satire was intended. I hope I have succeeded in making my points
 clearer.

 Regards,

 Andreas

 [1]

 https://en.wikipedia.org/wiki/Wikipedia_talk:Wikipedia_Signpost/2015-12-02/…
 [2] https://ddll.inf.tu-dresden.de/web/Wikidata/en
 [3]

http://www.bing.com/search?q=jerusalem&go=Submit&qs=n&form=QBLH…
 [4]
 https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
 [5]
 https://lists.wikimedia.org/pipermail/wikidata/2015-December/007769.html
 [6] https://archive.is/ZbV5A#selection-2997.0-3009.26
 [7] https://archive.is/ZbV5A#selection-2755.308-2763.27
 [8]

http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogus-bus…
 [9] http://www.pnas.org/content/112/33/E4512.abstract
 _______________________________________________
 Wikimedia-l mailing list, guidelines at:
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
 Wikimedia-l(a)lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wikimedia-l] Quality issues