Hoi,
"He who is without sin, throws the first stone". I read this article [1] in
Wired and it seems to me that Wikipedians, English Wikipedians at that have
plenty to do to get their own house in order. The topic was quality
particularly in Wikidata and it degenerated in a conversation that included
the Kazhak Wikipedia, the potential to manipulate information and whatever.
I am happy to say that quality is an issue. It is an issue for all of us.
However, I am firmly with Jane that once we have identified issues, we
should either come up with ways to make them manageable and/or
identifiable. The confrontation of 'sources or die' is easy" DIE. That is
not to say that sources are important but they hide too much and they too
are often and easily manipulated.
When quality is at issue, concentrate on that subject and for a moment
forget about secundary or tertiary caveats. If we can agree that our own
efforts, positively applied, will help us improve quality, we have a way
forward. There are micro and macro ways of improving quality. I give an
example of both.
Psychiatry and stigma are subjects woefully underdeveloped. I have added
one person and connected her to two award, a book, a few organisations,
people teaching at the University of Maastricht and several other people
occupied in this field. I asked her for additional information to expand
the field. This is a micro contribution and because of the links it has
quality.
A German University is interested to use Wikidata and wants to connect its
content to our content. They are happy to share their data and it is
important to them when their data is sourced to them. We are talking and it
may become a reality.
These are two ways of improving quality, one of them is explicitly about
sourcing. To me it is less in them being a source as them including their
reputation at the same time. The info I added about
"ervaringsdeskundigheid" is likely to be kept because it is well connected
and at some choice points sources are all too easy to include. Another
reason why it will stay is that my reputation is such that it is more than
likely correct. Even that is not so much of a concern because as more data
becomes available in Wikidata possible errors will be found and corrected.
(there are none as far as I am aware).
The point of this all? Quality is a goal, it is something that you achieve
by hard work. Wikipedia is a quality resource and it does have rough edges.
Wikidata is immature, underdeveloped and in need of all the love and care
it can get. Yes, there are secondary and tertiary concerns. But they should
not remove our attention of what is our main concern; the improved quality
that we can achieve only when we collaborate. At that Wikidata has plenty
to offer to Wikipedia already. In my opinion the easiest results are not so
much in the info boxes but more in revitalising the red links and removing
the many many links that are plain wrong.
Thanks,
GerardM
[1]
On 8 December 2015 at 00:02, Andreas Kolbe <jayen466(a)gmail.com> wrote:
Hi Markus,
On 1 December 2015 at 23:43, Markus Krötzsch <markus at
semantic-mediawiki.org>
<wikidata%
40lists.wikimedia.org?Subject=Re%3A%20%5BWikidata%5D%20%5BWikimedia-l%5D%20Quality%20issues&In-Reply-To=%3C565E30AB.6000709%40semantic-mediawiki.org%3E
wrote:
[I continue cross-posting for this reply, but it
would make sense to
return the thread to the Wikidata list where it started, so as to avoid
partial discussions happening in many places.]
Apologies for the late reply.
While you indicated that you had crossposted this reply to
Wikimedia-l, it didn't turn up in my inbox. I only saw it today, after
Atlasowa pointed it out on the Signpost op-ed's talk page.[1]
On 27.11.2015 12:08, Andreas Kolbe wrote:
>* Wikipedia content is considered a reliable
source in Wikidata, and
*> >* Wikidata content is used as a reliable source
by Google, where it
*> >* appears without any indication of its provenance.*
> This prompted me to reply. I wanted to write an email that merely says:
"Really? Where did you get this
from?" (Google using Wikidata content)
Multiple sources, including what appears to be your own research
group's writing:[2]
---o0o---
In December 2013, Google announced that their own collaboratively
edited knowledge base, Freebase, is to be discontinued in favour of
Wikidata, which gives Wikidata a prominent role as an in[p]ut for
Google Knowledge Graph. The research group Knowledge Systems
<https://ddll.inf.tu-dresden.de/web/Knowledge_Systems/en> is working
in close cooperation with the development team behind Wikidata, and
provides, e.g., the regular Wikidata RDF-Exports.
---o0o---
But then I read the rest ... so here you go ...
Your email mixes up many things and effects, some
of which are important
issues (e.g., the fact that VIAF is not a primary data source that
should be used in citations). Many other of your remarks I find very
hard to take serious, including but not limited to the following:
* A rather bizarre connection between licensing
models and
accountability (as if it would make content more credible if you are
legally required to say that you found it on Wikipedia, or even give a
list of user names and IPs who contributed)
Both Freebase and Wikipedia have attribution licences. When Bing's
Snapshot displays information drawn from Freebase or Wikipedia, it's
indicated thus at the bottom of the infobox[3]:
---o0o---
Data from Freebase · Wikipedia
---o0o---
I take this as a token gesture to these sources' attribution licences.
Given the amount of space they have available, I would think most
people would agree that this form of attribution is sufficient. You
couldn't possibly expect them to list all contributors who have ever
contributed to the lead of the Wikipedia article, for example, as the
letter of the licence might require.
However, I think it's proper and important that those minimal
attributions are there. And given Wikidata's CC0 licence, I don't
expect re-users to continue attributing in this manner. This view is
shared by Max Klein for example, who is quoted to that effect in the
Signpost op-ed.[4]
> * Some stories that I think you really just made up for the sake of
argument (Denny alone has picked the Wikidata
license?
Denny led the development team. There are multiple public instances
and accounts of his having advocated this choice and convinced people
of the wisdom of it, in Wikidata talk pages and elsewhere, including a
recent post on the Wikidata mailing list.[5]
Interestingly, he originally said that this would mean there could be
no imports from Wikipedia, and that there was in fact no intention to
import data from Wikipedias (see op-ed).[6] He also said, higher up on
that page, that this was "for starters", and that that decision could
easily be changed later on by the community.[7]
Google displays Wikidata content?
See above. If Wikidata plays "a prominent role as an in[p]ut for
Google Knowledge Graph" then I would expect there to be
correspondences between Knowledge Graph and Wikidata content.
Bing is fuelled by Wikimedia?)
I spoke of "Wikimedia-fuelled search engines like Google and Bing" in
the context of the Google Knowledge Graph and Bing's Snapshot/Satori
equivalent.
We all know that in both cases, much of the content Google and Bing
display in these infoboxes comes from Wikimedia projects (Wikipedia,
Commons and now, apparently, Wikidata).
* Some disjointed remarks about the history of
capitalism> * The
assertion that content is worse just because the author who
> created it
used a bot for editing
I spoke of "bot users mass-importing unreliable data". It's not the
bot method that makes the data unreliable: they are unreliable to
begin with (because they are unsourced, nobody verifies the source,
etc.).
As I pointed out in this week's op-ed, of the top fifteen hoaxes in
the English Wikipedia, six have active Wikidata items (or rather, had:
they were deleted this morning, after the op-ed appeared).
This is what I mean by unreliable data.
> * The idea that engineers want to build systems with bad data because
they like the challenge of cleaning it up -- I
mean: really! There is
nothing one can even
say to this.
Again, this is not quite what I was trying to convey. My impression is
that the current community effort at Wikidata emphasises speed: hence
the mass imports of data from Wikipedia, whether verifiable or not,
contrary to original intentions, as represented by Denny's quote
above.
As far as I can make out, present-day thinking among many Wikidatans
is: let's get lots of data in fast even though we know some of it will
be bad. Afterwards, we can then apply clever methods to check for
inconsistencies and clean our data up -- which is a challenge people
do seem to warm to. Meanwhile, others throw up their arms in dismay
and say, "Stop! You're importing bad data."
Wouldn't you agree that this characterises some of the recent
discussions on the Wikidata Project Chat page?
The two camps seem approximately evenly represented in the discussions
I've seen. But while the one camp says "Stop!", the other camp
continues importing. So in practice, the importers are getting their
way.
> * The complaint that Wikimedia employs too much engineering expertise
and too little content expertise (when, in
reality, it is a key > principle
of Wikimedia to keep out of content, and communities regularly > complain
WMF would still meddle too much).
Is it not obvious that I was talking about community practices rather
than the actions of Wikimedia staff?
> * All those convincing arguments you make against open, anonymous
editing because of it being easy to manipulate
(I've heard this from
Wikipedia critics
ten years ago; wonder what became of them)
Such criticisms are still regularly levelled at Wikipedia, in
top-quality publications. If you really want, I can send you a
literature list, but you could begin with this article in Newsweek.[6]
> * And, finally, the culminating conspiracy theory of total control over
> political opinion, destroying all plurality by allowing only one
viewpoint (not exactly what I observe on the
Web ...) -- and topping > this
by blaming it all on the choice of a particular Creative Commons > license
for Wikidata! Really, you can't make this up.
The information provided by default to billions of search engine users
*matters*. You can never prevent an individual from going to a website
that espouses a different view, but you don't have to for that
information to have a measurable effect.
Robert Epstein and Ronald E. Robertson recently published a paper on
what they called "The search engine manipulation effect (SEME) and its
possible impact on the outcomes of elections".[9] It provides further
detail.
> Summing up: either this is an elaborate satire that tries to test how
serious an answer you will get on a Wikimedia
list, or you should
*seriously* rethink
what you wrote here, take back the things that are
obviously bogus, and have a down-to-earth discussion about the topics > you
really care about (licenses and cyclic sourcing on Wikimedia > projects, I
guess; "capitalist companies controlling public media" > should be
discussed in another forum).
No satire was intended. I hope I have succeeded in making my points
clearer.
Regards,
Andreas
[1]
https://en.wikipedia.org/wiki/Wikipedia_talk:Wikipedia_Signpost/2015-12-02/…
[2]
https://ddll.inf.tu-dresden.de/web/Wikidata/en
[3]
http://www.bing.com/search?q=jerusalem&go=Submit&qs=n&form=QBLH…
[4]
https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
[5]
https://lists.wikimedia.org/pipermail/wikidata/2015-December/007769.html
[6]
https://archive.is/ZbV5A#selection-2997.0-3009.26
[7]
https://archive.is/ZbV5A#selection-2755.308-2763.27
[8]
http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogus-bus…
[9]
http://www.pnas.org/content/112/33/E4512.abstract
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>