Tim Starling wrote:
Reid Priedhorsky wrote:
As I mentioned a week or two ago, I am trying to model historical
article view rates of English Wikipedia. One of the sources of data for
this is:
http://stats.wikimedia.org/EN/TablesUsagePageRequest.htm
I was wondering what the specific definition of "page request" is in
this context, in a way that I can replicate the count on current data
using the sampled access logs that I have.
I believe that figure came from Webalizer. We did attempt to make it
roughly a page view count at the time, and that's probably as much as we
can say about it now. It's a bit late for a detailed error analysis.
OK. So that means that nobody's got the regular expressions around any
more? Or have the URLs changed so that the old regexes are no longer
relevant?
I tried counting URLs of the two forms:
http://en.wikipedia.org/wiki/Pierre_Omidyar
http://en.wikipedia.org/w/index.php?title=Pierre_Omidyar
This yields 102 million article views per day on average over the past
month or so.
When I graph log(Webalyzer data) and (log(Alexa data) - c), where c is a
constant adjusted by hand to make the two lines match up pretty well,
then the 102 Mviews/day figure also matches the Alexa data quite well.
This is encouraging and suggests that I'm counting correctly. Does that
seem reasonable to you all as well?
We should probably also tell you about the reqstats
data, if you haven't
found it already:
http://hemlock.knams.wikimedia.org/~leon/stats/reqstats/reqstats-yearly.png
This is the raw request rate, including images. Mark Bergsma may be able
to supply you with the source data. I don't know if we have any equivalent
data from before April 2006.
Yeah, I saw that. I'm not sure exactly what to make of it, since I'm
interested in article view rate, and that seemed to be all requests.
I would like to add that I really appreciate the ongoing help you all
have given. It's made our task here at UMN considerably easier and more
effective, and it means a lot to me.
Take care,
Reid