I'm sending this to Wikimedia-l, Wikitech-l, and Research-l in case other people in
the Wikimedia movement or staff are interested in "big data" as it relates to
Wikimedia. I hope that those who are interested in discussions about WMF editor engagement
efforts, WMF fundraising, or WMF HR practices will also find that this email interests
them. Feel free to skip straight to the links in the latter portion of this email if
you're already familiar with "big data" and its analysis and if you just
want to see what other people are writing about the subject.
* Introductory comments / my personal opinion
"Big data" refers to large quantities of information that are so large that they
are difficult to analyze and may not be related internally in an obvious way. See
https://en.wikipedia.org/wiki/Big_data
I think that most of us would agree that moving much of an organization's information
into "the Cloud", and/or directing people to analyze massive quantities of
information, will not automatically result in better, or even good, decisions based on
that information. Also, I think that most of us would agree that bigger and/or more
accessible quantities of data does not necessarily imply that the data are more accurate
or more relevant for a particular purpose. Another concern is the possibility of unwelcome
intrusions into sensitive information, including the possibility of data breaches; imagine
the possible consequences if a hacker broke into supposedly secure databases held by
Facebook or the Securities and Exchange Commission.
We have an enormous quantity of data on Wikimedia projects, and many ways that we can
examine those data. As this Dilbert strip points out, context is important, and looking
at statistics devoid of their larger contexts can be problematic.
http://dilbert.com/strips/comic/1993-02-07/
Since data analysis is also something that Wikipedia does in the areas I mentioned
previously, I'm passing along a few links for those who may be interested about the
benefits and limitations of big data.
* Links:
From the Harvard Business Review
http://hbr.org/2012/04/good-data-wont-guarantee-good-decisions/ar/1
From the New York Times
https://www.nytimes.com/2012/12/30/technology/big-data-is-great-but-dont-fo…
and
https://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-wo…
From the Wall Street Journal. This may be especially
interesting to those who are participating in the discussions on Wikimedia-l regarding how
Wikimedia selects, pays, and manages its staff.
http://online.wsj.com/article/SB10000872396390443890304578006252019616768.h…
And from English Wikipedia (:
https://en.wikipedia.org/wiki/Big_data
and
https://en.wikipedia.org/wiki/Data_mining
and
https://en.wikipedia.org/wiki/Business_intelligence
Cheers,
Pine