Reid Priedhorsky wrote:
Hi folks,
As I mentioned a week or two ago, I am trying to model historical article view rates of English Wikipedia. One of the sources of data for this is:
http://stats.wikimedia.org/EN/TablesUsagePageRequest.htm
I was wondering what the specific definition of "page request" is in this context, in a way that I can replicate the count on current data using the sampled access logs that I have.
The "definition" link on the above says that a page is a URL "... that would be considered the actual page being requested, and not all of the individual items that make it up (such as graphics and audio clips). Some people call this metric page views or page impressions, and defaults to any URL that has an extension of .htm, .html or .cgi." This seems incongruent with the URLs used at Wikipedia.
Any help would be greatly appreciated. Please let me know if you have any questions.
I believe that figure came from Webalizer. We did attempt to make it roughly a page view count at the time, and that's probably as much as we can say about it now. It's a bit late for a detailed error analysis.
We should probably also tell you about the reqstats data, if you haven't found it already:
http://hemlock.knams.wikimedia.org/~leon/stats/reqstats/reqstats-yearly.png
This is the raw request rate, including images. Mark Bergsma may be able to supply you with the source data. I don't know if we have any equivalent data from before April 2006.
-- Tim Starling