Hi there,

Although this doesn't answer your specific question, I thought that I'd share that my observations from watching traffic patterns on some Wikimedia pages suggests that the classification of readers into bot, spider, or human has some margin of error, but I don't know what the margin of error is. The margin of error might be worth considering as you analyze the traffic that interests you, especially if you have reason to believe that the margin of error is statistically significant.

On Tue, Nov 13, 2018 at 2:41 PM Jennifer Pan <jp1@stanford.edu> wrote:

Hi there,

I'm an assistant professor in the Department of Communication at Stanford. My co-author, Molly Roberts (Political Science, UCSD), and I are working on a paper examining the effect of China's 2015 block of Chinese language wikipedia on pageviews, which builds on our previous work on censorship in China.

We are using the block to conduct a interrupted time series design to measure the effect of censorship on Chinese users. Our main finding is that Chinese users were using Wikipedia to browse (starting at the home page), and the block influenced users' ability to explore and encounter unexpected information. One question we have is whether the pageviews we observe are driven by bots and spiders. We know that the wikimedia rest api provides this information going back to July 1 2015. Since the China block of Wikipedia was on May 19, 2015, we are wondering if there is pageview data by agent type for zh.wikipedia.org pages (all or some subset like most popular) going back to May 2015 (specifically May 18-21, 2015)? From https://meta.wikimedia.org/wiki/Research:Timeline_of_Wikimedia_analytics
it says that pageview data is available in bulk starting on May 1, 2015, so we thought maybe there was some chance this data exists.

Any suggestions would be greatly appreciated, and if this is not possible, please let us know.

Thank you!
Jennifer Pan

Analytics mailing list