Re: [Wikimedia-l] Quality issues

1 Dec 2015


      On Sun, Nov 29, 2015 at 2:55 PM, Gerard Meijssen gerard.meijssen@gmail.com
wrote:
...
So identify an issue and it can be dealt with.
The fact an issue *can* be dealt with does not mean that it *will* be dealt
with.
For example, in the post that opened this discussion a little over a week
ago, you said:
"At Wikidata we often find issues with data imported from a Wikipedia.
Lists have been produced with these issues on the Wikipedia involved and
arguably they do present issues with the quality of Wikipedia or Wikidata
for that matter. So far hardly anything resulted from such outreach."
These were your own words: "hardly anything resulted from such outreach."
Wikimedia is three years into this project. If people produce lists of
quality issues, that's great, but if nothing happens as a result, that's
not so great.
An example of this is available in this very thread. Three days ago I
mentioned the issues with the Grasulf II of Friuli entries on Reasonator
and Wikidata. I didn't expect that you or anyone else would fix them, and
they haven't been, at the time of writing.
You certainly could have fixed them -- you have made hundreds of edits on
Wikidata since replying to that post of mine -- but you haven't. Adding new
data is more satisfying than sourcing and improving an obscure entry. (If
you're wondering why I didn't fix the entry myself, see the section "And to
answer the obvious question …" in last month's Signpost op-ed.[1])
This problem is replicated across the Wikimedia universe. Wikimedia
projects are run by volunteers. They work on what interests them, or
whatever they have an investment in. Fixing old errors is not as appealing
as importing 2 million items of new data (including tens or hundreds of
thousands of erroneous ones), because fixing errors is slow work. It
retards the growth of your edit count! You spend one hour researching a
date, and all you get for that effort is one lousy edit in your
contributions history. There are plenty of tasks allowing you to rack up
500 edits in 5 minutes. People seem to prefer those.
That is why Wikipedia has the familiar backlogs in areas like copyright
infringement or AfC. Even warning templates indicating bias or other
problematic content often sit for years without being addressed.
There is a systemic mismatch between data creation and data curation. There
is a lot of energy for the former, and very little energy for the latter.
That is why initiatives like the one started by WMF board member James
Heilman and others, to have the English Wikipedia's medical articles
peer-reviewed, are so important. They are small steps in the right
direction.
...
When we are afraid about a Seigenthaler type of event based on Wikidata,
rest assured there is plenty wrong in either Wikipedia or Wikidata tha
makes it possible for it to happen. The most important thing is to deal
with it responsibly. Just being afraid will not help us in any way. Yes we
need quality and quantity. As long as we make a best effort to improve our
data, we will do well.
That's "eventualism". "Quality is terrible, but eventually it will be
great, because ... we're all trying, and it's a wiki!" To me that sounds
more like religious faith or magical thinking than empirical science.
Things being on a wiki does not guarantee quality; far from it.[2][3][4][5]
...
As to the Wikipedian is residence, that is his opinion. At the same time
the article on ebola has been very important. It may not be science but it
certainly encyclopaedic. At the same time this Wikipedian in residence is
involved, makes a positive contribution and while he may make mistakes he
is part of the solution.
I am happy that you propose that work is to be done. What have you done but
more importantly what are you going to do? For me there is "Number of
edits:
2,088,923" https://www.wikidata.org/wiki/Special:Contributions/GerardM
I will do what I can to encourage Wikimedia Foundation board members and
management to review the situation, in consultation with outside academics
like those at the Oxford Internet Institute who are concerned about present
developments, and to consider whether more stringent sourcing policies are
required for Wikidata in order to assure the quality and traceability of
data in the Wikidata corpus.
The public is the most important stakeholder in this, and should be
informed and involved. If there are quality issues, the Wikimedia
Foundation should be completely transparent about them in its public
communications, neither minimising nor exaggerating the issues. Known
problems and potential issues should be publicised as widely as possible in
order to minimise the harm to society resulting from uncritical reuse of
faulty data.
I have started to reach out to scholars and journalists, inviting them to
review this thread as well as related materials, and form their own
conclusions. I may write an op-ed about it in the Signpost, because I
believe it's an important issue that deserves wider attention and debate.
As far as my own contributions are concerned, I am more inclined to boycott
Wikidata.
Apart from all the issues discussed over the past few days, there is
another aspect to my reluctance to contribute to Wikidata.
The Knowledge Graph is a major new Google feature. It adds value to
Google's search engine results pages. It stops people from clicking through
to other sources, including Wikipedia. The recent downturn in Wikipedia
pageviews has been widely linked to the Knowledge Graph.
By ensuring that more people visit Google's ad-filled pages, and stay on
them rather than clicking through to other sites, the Knowledge Graph is at
least partly responsible for recent increases in Google's revenue, which
currently stands at around $200 million a day.[6] (Income after expenses is
about a third of that, i.e. $65 million.)
The development of Wikidata was co-funded by Google, which I understand
donated 325,000 Euros (about $345,000) to that effort.[8] A little bit of
arithmetic shows that, with Google's profits running at $65 million a day,
it takes Google less than 8 minutes to earn that amount of money. Given how
much Google stands to benefit from this development, it seems a paltry
investment.
This set me thinking. If we assume that Wikipedia's and Wikidata's
contribution to Google's annual revenue via the Knowledge Graph is just
1/365 – the revenue of one day per year – the monetary value of these
projects to Google is still astronomical.
There have been around 2.5 billion edits to Wikimedia projects to date.[7]
If Google chose to give one day's revenue each year to Wikimedia
volunteers, as a thank-you, this would average out at about 200,000,000 /
2,500,000,000 = 8 cents per edit. Someone like Koavf, who's made 1.5
million edits[9], would stand to receive around $120,000 a year. Even my
paltry 50,000 edits would net me about $4,000 a year. That's the value of
free content.
And that's just Google. Other major players like Facebook and Bing profit,
too.
Wikidata seems custom-made to benefit Google and Microsoft, at the expense
of Wikipedia and other sites. Given my other commitments to Wikimedia
projects, the limited number of hours in a day, and all the other concerns
mentioned in this thread, I feel little inclined at present to further
expand my volunteering in order to work for these multi-billion dollar
corporations for free.
[1]
https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-10-07/Op-ed
[2]
http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogus-busi...
[3]
http://www.salon.com/2013/05/17/revenge_ego_and_the_corruption_of_wikipedia/
[4]
https://www.washingtonpost.com/news/the-intersect/wp/2015/04/15/the-great-wi...
[5]
http://www.dailydot.com/politics/croatian-wikipedia-fascist-takeover-controv...
[6] https://investor.google.com/earnings/2015/Q2_google_earnings.html
[7] https://tools.wmflabs.org/wmcounter/
[8] https://www.wikimedia.de/wiki/Pressemitteilungen/PM_3_12_Wikidata_EN
[9]
https://www.washingtonpost.com/news/the-intersect/wp/2015/07/22/you-dont-kno...

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wikimedia-l] Quality issues