The problem (as always) is that there is a difference between pages served (by the web server) and pages actually wanted and read by the user.
It would be interesting to have referrer statistics. I'm guessing that many of Wikipedia pages are being referred by Google (and other general search engines). If so, people may just be clicking through a list of search results, which causes them to download a WP page but then immediately move onto the next search result because it isn't what they are looking for. I rather suspect the prominence of Facebook in the English Wikipedia results is due to this effect, as I often find myself on the Wikipedia page for Facebook instead of Facebook itself following a google search. I think the use of mobile devices (with small screens) probably encourages this sort of behaviour.
Kerry
-----Original Message----- From: wiki-research-l-bounces@lists.wikimedia.org [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Andrew G. West Sent: Sunday, 30 December 2012 2:06 PM To: wiki-research-l@lists.wikimedia.org Subject: Re: [Wiki-research-l] 2012 top pageview list
The WMF aggregates them as (page,views) pairs on an hourly basis:
http://dumps.wikimedia.org/other/pagecounts-raw/
I've been parsing these and storing them in a query-able DB format (for en.wp exclusively; though the files are available for all projects I think) for about two years. If you want to maintain such a fine granularity, it can quickly become a terrabyte scale task that eats up a lot of processing time.
If your looking for more coarse granularity reports (like top views for day, week, month) a lot of efficient aggregation can be done.
See also: http://en.wikipedia.org/wiki/Wikipedia:5000
Thanks, -AW
On 12/28/2012 07:28 PM, John Vandenberg wrote:
There is a steady stream of blogs and 'news' about these lists
https://encrypted.google.com/search?client=ubuntu&channel=fs&q=%22Se... nd%22&ie=utf-8&oe=utf-8#q=wikipedia+top+2012&hl=en&safe=off&client=ubuntu&tb o=d&channel=fs&tbm=nws&source=lnt&tbs=qdr:w&sa=X&psj=1&ei=GzjeUOPpAsfnrAeQk4 DgCg&ved=0CB4QpwUoAw&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&bvm=bv.1355534169,d.aW M&fp=4e60e761ee133369&bpcl=40096503&biw=1024&bih=539
How does a researcher go about obtaining access logs with useragents in order to answer some of these questions?