On Thu, Aug 20, 2009 at 12:46 PM, Jimmy Walesjwales@wikia-inc.com wrote: [snip]
Greg, I think your email sounded a little negative at the start, but not so much further down. I think you would join me heartily in being super grateful for people doing this kind of analysis. Yes, some of it will be primitive and will suffer from the many difficulties. But data-driven decisionmaking is a great thing, particularly when we are cognizant of the limitations of the data we're using.
I just didn't want anyone to get the idea (and I'm sure I'm reading you right) that you were opposed to people doing research. :-)
Absolutely— No one who has done thing kind of analysis could fail to appreciate the enormous amount of work that goes into even making a couple of simple seemingly "off the cuff" numbers out of the mountain of data that is Wikipedia.
Making sure the numbers are accurate and meaningful while also clearly explaining the process of generating is in and of itself a large amount of work, and my gratitude is extended to anyone who contributes to those processes.
I've long been a loud proponent of data driven decision making. So I'm absolutely not opposed to people doing research, but just as you said— we need to be acutely aware of the limitations of the research. Weak data is clearly better than no data, but only when you are aware of the strength of the data. Or, in other words, knowing what you don't know is often *the most critical* piece of information in any decision making process.
In our eagerness to establish what we can and do know it can be easy to forget how much we don't know. Some of the limitations which are all too obvious to researchers are less than obvious to people who've never personally done quantitative analysis on Wikipedia data, yet many of these people are the decision makers that must do something useful with the data. The casual language used when researchers write for researchers can magnify misunderstandings. It was merely my intent to caution against the related risks.
I think the most impactful contributions available for researchers today are less in the area of the direct research itself but are instead in advancing the art of researching Wikipedia. But the two go hand in hand, we can't advance the art if we don't do the research. The latter type is less sexy and not prone to generating headlines, but it is work that will last and generate citations for a long time. Measurements of X today will be soon forgotten as they are replaced by later analysis of the historical data using superior techniques.
That my tone was somewhat negative is only due to my extreme disappointment in that our own discussion of recent measurements has been almost entirely devoid of critical analysis. Contributors patting themselves on the back and saying "I told you so!" seem to be outnumbering suggestions that the research might mean something else entirely, though perhaps that is my own bias speaking. To the extent that I'm wrong about that I hope that my comments were merely redundant, to the extent that I'm right I hope my points will invite nuanced understanding of the research and encourage people to seek out and expose potentially confounding variables and bad-proxies so that all our knowledge can be advanced.
If this stuff were easy it would all be done already. Wikipedia research is interesting because it is both hard and potentially meaningful. There is room and need for contributions from everyone.
Cheers!