On Sun, Nov 29, 2015 at 2:55 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
So identify an issue and it can be dealt with.
The fact an issue *can* be dealt with does not mean that it *will* be dealt with.
For example, in the post that opened this discussion a little over a week ago, you said:
"At Wikidata we often find issues with data imported from a Wikipedia. Lists have been produced with these issues on the Wikipedia involved and arguably they do present issues with the quality of Wikipedia or Wikidata for that matter. So far hardly anything resulted from such outreach."
These were your own words: "hardly anything resulted from such outreach." Wikimedia is three years into this project. If people produce lists of quality issues, that's great, but if nothing happens as a result, that's not so great.
An example of this is available in this very thread. Three days ago I mentioned the issues with the Grasulf II of Friuli entries on Reasonator and Wikidata. I didn't expect that you or anyone else would fix them, and they haven't been, at the time of writing.
You certainly could have fixed them -- you have made hundreds of edits on Wikidata since replying to that post of mine -- but you haven't. Adding new data is more satisfying than sourcing and improving an obscure entry. (If you're wondering why I didn't fix the entry myself, see the section "And to answer the obvious question …" in last month's Signpost op-ed.[1])
This problem is replicated across the Wikimedia universe. Wikimedia projects are run by volunteers. They work on what interests them, or whatever they have an investment in. Fixing old errors is not as appealing as importing 2 million items of new data (including tens or hundreds of thousands of erroneous ones), because fixing errors is slow work. It retards the growth of your edit count! You spend one hour researching a date, and all you get for that effort is one lousy edit in your contributions history. There are plenty of tasks allowing you to rack up 500 edits in 5 minutes. People seem to prefer those.
That is why Wikipedia has the familiar backlogs in areas like copyright infringement or AfC. Even warning templates indicating bias or other problematic content often sit for years without being addressed.
There is a systemic mismatch between data creation and data curation. There is a lot of energy for the former, and very little energy for the latter. That is why initiatives like the one started by WMF board member James Heilman and others, to have the English Wikipedia's medical articles peer-reviewed, are so important. They are small steps in the right direction.
When we are afraid about a Seigenthaler type of event based on Wikidata, rest assured there is plenty wrong in either Wikipedia or Wikidata tha makes it possible for it to happen. The most important thing is to deal with it responsibly. Just being afraid will not help us in any way. Yes we need quality and quantity. As long as we make a best effort to improve our data, we will do well.
That's "eventualism". "Quality is terrible, but eventually it will be great, because ... we're all trying, and it's a wiki!" To me that sounds more like religious faith or magical thinking than empirical science.
Things being on a wiki does not guarantee quality; far from it.[2][3][4][5]
As to the Wikipedian is residence, that is his opinion. At the same time the article on ebola has been very important. It may not be science but it certainly encyclopaedic. At the same time this Wikipedian in residence is involved, makes a positive contribution and while he may make mistakes he is part of the solution.
I am happy that you propose that work is to be done. What have you done but more importantly what are you going to do? For me there is "Number of edits: 2,088,923" https://www.wikidata.org/wiki/Special:Contributions/GerardM
I will do what I can to encourage Wikimedia Foundation board members and management to review the situation, in consultation with outside academics like those at the Oxford Internet Institute who are concerned about present developments, and to consider whether more stringent sourcing policies are required for Wikidata in order to assure the quality and traceability of data in the Wikidata corpus.
The public is the most important stakeholder in this, and should be informed and involved. If there are quality issues, the Wikimedia Foundation should be completely transparent about them in its public communications, neither minimising nor exaggerating the issues. Known problems and potential issues should be publicised as widely as possible in order to minimise the harm to society resulting from uncritical reuse of faulty data.
I have started to reach out to scholars and journalists, inviting them to review this thread as well as related materials, and form their own conclusions. I may write an op-ed about it in the Signpost, because I believe it's an important issue that deserves wider attention and debate.
As far as my own contributions are concerned, I am more inclined to boycott Wikidata.
Apart from all the issues discussed over the past few days, there is another aspect to my reluctance to contribute to Wikidata.
The Knowledge Graph is a major new Google feature. It adds value to Google's search engine results pages. It stops people from clicking through to other sources, including Wikipedia. The recent downturn in Wikipedia pageviews has been widely linked to the Knowledge Graph.
By ensuring that more people visit Google's ad-filled pages, and stay on them rather than clicking through to other sites, the Knowledge Graph is at least partly responsible for recent increases in Google's revenue, which currently stands at around $200 million a day.[6] (Income after expenses is about a third of that, i.e. $65 million.)
The development of Wikidata was co-funded by Google, which I understand donated 325,000 Euros (about $345,000) to that effort.[8] A little bit of arithmetic shows that, with Google's profits running at $65 million a day, it takes Google less than 8 minutes to earn that amount of money. Given how much Google stands to benefit from this development, it seems a paltry investment.
This set me thinking. If we assume that Wikipedia's and Wikidata's contribution to Google's annual revenue via the Knowledge Graph is just 1/365 – the revenue of one day per year – the monetary value of these projects to Google is still astronomical.
There have been around 2.5 billion edits to Wikimedia projects to date.[7] If Google chose to give one day's revenue each year to Wikimedia volunteers, as a thank-you, this would average out at about 200,000,000 / 2,500,000,000 = 8 cents per edit. Someone like Koavf, who's made 1.5 million edits[9], would stand to receive around $120,000 a year. Even my paltry 50,000 edits would net me about $4,000 a year. That's the value of free content.
And that's just Google. Other major players like Facebook and Bing profit, too.
Wikidata seems custom-made to benefit Google and Microsoft, at the expense of Wikipedia and other sites. Given my other commitments to Wikimedia projects, the limited number of hours in a day, and all the other concerns mentioned in this thread, I feel little inclined at present to further expand my volunteering in order to work for these multi-billion dollar corporations for free.
[1] https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-10-07/Op-ed [2] http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogus-busi... [3] http://www.salon.com/2013/05/17/revenge_ego_and_the_corruption_of_wikipedia/ [4] https://www.washingtonpost.com/news/the-intersect/wp/2015/04/15/the-great-wi... [5] http://www.dailydot.com/politics/croatian-wikipedia-fascist-takeover-controv... [6] https://investor.google.com/earnings/2015/Q2_google_earnings.html [7] https://tools.wmflabs.org/wmcounter/ [8] https://www.wikimedia.de/wiki/Pressemitteilungen/PM_3_12_Wikidata_EN [9] https://www.washingtonpost.com/news/the-intersect/wp/2015/07/22/you-dont-kno...