I'm sending this to Wikimedia-l, Wikitech-l, and Research-l in case other people in the Wikimedia movement or staff are interested in "big data" as it relates to Wikimedia. I hope that those who are interested in discussions about WMF editor engagement efforts, WMF fundraising, or WMF HR practices will also find that this email interests them. Feel free to skip straight to the links in the latter portion of this email if you're already familiar with "big data" and its analysis and if you just want to see what other people are writing about the subject.
* Introductory comments / my personal opinion
"Big data" refers to large quantities of information that are so large that they are difficult to analyze and may not be related internally in an obvious way. See https://en.wikipedia.org/wiki/Big_data
I think that most of us would agree that moving much of an organization's information into "the Cloud", and/or directing people to analyze massive quantities of information, will not automatically result in better, or even good, decisions based on that information. Also, I think that most of us would agree that bigger and/or more accessible quantities of data does not necessarily imply that the data are more accurate or more relevant for a particular purpose. Another concern is the possibility of unwelcome intrusions into sensitive information, including the possibility of data breaches; imagine the possible consequences if a hacker broke into supposedly secure databases held by Facebook or the Securities and Exchange Commission.
We have an enormous quantity of data on Wikimedia projects, and many ways that we can examine those data. As this Dilbert strip points out, context is important, and looking at statistics devoid of their larger contexts can be problematic. http://dilbert.com/strips/comic/1993-02-07/
Since data analysis is also something that Wikipedia does in the areas I mentioned previously, I'm passing along a few links for those who may be interested about the benefits and limitations of big data.
* Links:
From the Harvard Business Review
http://hbr.org/2012/04/good-data-wont-guarantee-good-decisions/ar/1
From the New York Times
https://www.nytimes.com/2012/12/30/technology/big-data-is-great-but-dont-for... and https://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-wor...
From the Wall Street Journal. This may be especially interesting to those who are participating in the discussions on Wikimedia-l regarding how Wikimedia selects, pays, and manages its staff.
http://online.wsj.com/article/SB10000872396390443890304578006252019616768.ht...
And from English Wikipedia (: https://en.wikipedia.org/wiki/Big_data and https://en.wikipedia.org/wiki/Data_mining and https://en.wikipedia.org/wiki/Business_intelligence
Cheers,
Pine
Hi Pine,
It might be because of the alcohol I've ingested these last days, but - what are you proposing exactly?
Hapy new year, strainu
2012/12/30, ENWP Pine deyntestiss@hotmail.com:
I'm sending this to Wikimedia-l, Wikitech-l, and Research-l in case other people in the Wikimedia movement or staff are interested in "big data" as it relates to Wikimedia. I hope that those who are interested in discussions about WMF editor engagement efforts, WMF fundraising, or WMF HR practices will also find that this email interests them. Feel free to skip straight to the links in the latter portion of this email if you're already familiar with "big data" and its analysis and if you just want to see what other people are writing about the subject.
- Introductory comments / my personal opinion
"Big data" refers to large quantities of information that are so large that they are difficult to analyze and may not be related internally in an obvious way. See https://en.wikipedia.org/wiki/Big_data
I think that most of us would agree that moving much of an organization's information into "the Cloud", and/or directing people to analyze massive quantities of information, will not automatically result in better, or even good, decisions based on that information. Also, I think that most of us would agree that bigger and/or more accessible quantities of data does not necessarily imply that the data are more accurate or more relevant for a particular purpose. Another concern is the possibility of unwelcome intrusions into sensitive information, including the possibility of data breaches; imagine the possible consequences if a hacker broke into supposedly secure databases held by Facebook or the Securities and Exchange Commission.
We have an enormous quantity of data on Wikimedia projects, and many ways that we can examine those data. As this Dilbert strip points out, context is important, and looking at statistics devoid of their larger contexts can be problematic. http://dilbert.com/strips/comic/1993-02-07/
Since data analysis is also something that Wikipedia does in the areas I mentioned previously, I'm passing along a few links for those who may be interested about the benefits and limitations of big data.
- Links:
From the Harvard Business Review http://hbr.org/2012/04/good-data-wont-guarantee-good-decisions/ar/1
From the New York Times https://www.nytimes.com/2012/12/30/technology/big-data-is-great-but-dont-for... and https://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-wor...
From the Wall Street Journal. This may be especially interesting to those who are participating in the discussions on Wikimedia-l regarding how Wikimedia selects, pays, and manages its staff. http://online.wsj.com/article/SB10000872396390443890304578006252019616768.ht...
And from English Wikipedia (: https://en.wikipedia.org/wiki/Big_data and https://en.wikipedia.org/wiki/Data_mining and https://en.wikipedia.org/wiki/Business_intelligence
Cheers,
Pine
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Dear Pine,
thank you for sharing these links. I cannot read everything now, but one of these warticles was also recommended by a friend, Sure, Big Data Is Great. But So Is Intuition.http://www.nytimes.com/2012/12/30/technology/big-data-is-great-but-dont-forget-intuition.html?_r=1&, by Steve Lohr, that reminded me a case in Brazil which avoided previous mistakes, that was a collaborative process of hiringhttp://blog.wikimedia.org/2012/01/11/brazil-recruiting-and-partnership-with-the-community-moves-forward/a consultant for the WMF programs in Brazil, i. e., the community was listened. (See the full discussion that resulted in a better process here < http://comments.gmane.org/gmane.org.wikimedia.brazil/161%3E, although it still can improve.)
"It’s encouraging that thoughtful data scientists like Ms. Perlich and Ms. Schutt recognize the limits and shortcomings of the Big Data technology that they are building. Listening to the data is important, they say, but so is experience and intuition. After all, what is intuition at its best but large amounts of data of all kinds filtered through a human brain rather than a math model?"
As Alexandre Abdo pointed outhttp://permalink.gmane.org/gmane.org.wikimedia.brazil/358in this not so old discussion, we, the Brazilian community, were being handled as "consummated facts", and the community experience and intuition was not being taken into account as it could - although I must tell a lot of efforts were done in this direction. I hope a lesson was /learned/ and this can help to the direction the organization is taking with its grantmaking and learnings. :)
This also reminds me that there is no mathematical model that explains now (maybe there never will...) the kind of system Wikimedia projects deal with and sometimes lovely graphics and data interpretations are assumed as scientific statements, regardless of their scientifically underpinnings.
Have a good year,
Tom
On Sun, Dec 30, 2012 at 1:26 AM, ENWP Pine deyntestiss@hotmail.com wrote:
I'm sending this to Wikimedia-l, Wikitech-l, and Research-l in case other people in the Wikimedia movement or staff are interested in "big data" as it relates to Wikimedia. I hope that those who are interested in discussions about WMF editor engagement efforts, WMF fundraising, or WMF HR practices will also find that this email interests them. Feel free to skip straight to the links in the latter portion of this email if you're already familiar with "big data" and its analysis and if you just want to see what other people are writing about the subject.
- Introductory comments / my personal opinion
"Big data" refers to large quantities of information that are so large that they are difficult to analyze and may not be related internally in an obvious way. See https://en.wikipedia.org/wiki/Big_data
I think that most of us would agree that moving much of an organization's information into "the Cloud", and/or directing people to analyze massive quantities of information, will not automatically result in better, or even good, decisions based on that information. Also, I think that most of us would agree that bigger and/or more accessible quantities of data does not necessarily imply that the data are more accurate or more relevant for a particular purpose. Another concern is the possibility of unwelcome intrusions into sensitive information, including the possibility of data breaches; imagine the possible consequences if a hacker broke into supposedly secure databases held by Facebook or the Securities and Exchange Commission.
We have an enormous quantity of data on Wikimedia projects, and many ways that we can examine those data. As this Dilbert strip points out, context is important, and looking at statistics devoid of their larger contexts can be problematic. http://dilbert.com/strips/comic/1993-02-07/
Since data analysis is also something that Wikipedia does in the areas I mentioned previously, I'm passing along a few links for those who may be interested about the benefits and limitations of big data.
- Links:
From the Harvard Business Review http://hbr.org/2012/04/good-data-wont-guarantee-good-decisions/ar/1
From the New York Times
https://www.nytimes.com/2012/12/30/technology/big-data-is-great-but-dont-for... and
https://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-wor...
From the Wall Street Journal. This may be especially interesting to those who are participating in the discussions on Wikimedia-l regarding how Wikimedia selects, pays, and manages its staff.
http://online.wsj.com/article/SB10000872396390443890304578006252019616768.ht...
And from English Wikipedia (: https://en.wikipedia.org/wiki/Big_data and https://en.wikipedia.org/wiki/Data_mining and https://en.wikipedia.org/wiki/Business_intelligence
Cheers,
Pine
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wikimedia-l@lists.wikimedia.org