On Thu, Aug 11, 2011 at 8:23 PM, MZMcBride z@mzmcbride.com wrote:
This is a neat idea. MediaWiki has some page view count support built in, but it's been disabled on Wikimedia wikis for pretty much forever. The reality is that MediaWiki isn't launched for the vast majority of requests. A user making an edit is obviously different, though. I think a database with per-day view support would make this feature somewhat feasible, in a JavaScript gadget or in a MediaWiki extension.
Oh, totally. The only place we can get meaningful data is from the squids, which is where Dario's data comes from, yes?
Sadly, anything we build that works with that architecture won't be so useful to other mediawiki installations, at least on the backend. I can imagine an extension that displays the info to editors that can fetch its stats from a slightly abstracted datasource, with some architecture whereby the stats could come from a variety of log-processing applications via a script or plugin. Then, we could write a connector for our cache clusters, and someone else could write one for theirs, and we'd still get to share one codebase for everything else.
Are there other specific projects that require this data? It will be
much
easier to make a case for accelerating development of the dataset if
there
are some clear examples of where it's needed, and especially if it can
help
to meet the current editor retention goals.
Heh. It's refreshing to hear this said aloud. Yes, if there were some way to tie page view stats to fundraising/editor retention/usability/the gender gap/the Global South, it'd be much simpler to get resources devoted to it. Without a doubt.
There are countless applications for this data, particularly as a means of measuring Wikipedia's impact. This data also provides a scale against which other articles and projects can be measured. In a vacuum, knowing that the English Wikipedia's article "John Doe" received 400 views per day on average in June means very little. When you can compare that figure to the average views per day of every other article on the English Wikipedia (or every other article on the German Wikipedia), you can begin doing real analysis work. Currently, this really isn't possible, and that's a Bad Thing.
Oh, totally. I can see a lot of really effective potential applications for this data. However, the link between view statistics and editor retention isn't necessarily immediately clear. At WMF at least, the prevailing point-of-view is that readership is doing okay, and at the moment we're focused on other areas. Personally, I think readership numbers are an important piece of the puzzle and could be a direct motivator for editing. Furthermore, increasingly nuanced readership stats might be usable for that other perennial goal, fundraising (thought I don't have specific ideas for this at the moment).
I wonder if maybe we could consolidate a couple of concrete proposals for features that are dependent on this data. That would help to highlight this as a bottleneck and clearly explain how solving this problem now will help contribute to meeting current goals.
My thinking is, if it's possible to make a good case for it, it should happen now. Even if WMF has a req out for a developer to build this, there's no reason to avoid consolidating the research and ideas in one place so that person can work more effectively. Were someone in the community to start building it, even better! If we bring on a dev to collaborate and maintain it long-term, they'd just end up working together closely for a while, which would accelerate the learning process for the new developer. As someone who's still on the steep part of that learning curve, I can attest that any and all information we can provide will get this feature out the door faster. :)
-Ian