Hi there,
I'm an assistant professor in the Department of Communication at Stanford. My co-author, Molly Roberts (Political Science, UCSD), and I are working on a paper examining the effect of China's 2015 block of Chinese language wikipedia on pageviews, which builds on our previous work on censorship in China.
We are using the block to conduct a interrupted time series design to measure the effect of censorship on Chinese users. Our main finding is that Chinese users were using Wikipedia to browse (starting at the home page), and the block influenced users' ability to explore and encounter unexpected information. One question we have is whether the pageviews we observe are driven by bots and spiders. We know that the wikimedia rest api provides this information going back to July 1 2015. Since the China block of Wikipedia was on May 19, 2015, we are wondering if there is pageview data by agent type for zh.wikipedia.org pages (all or some subset like most popular) going back to May 2015 (specifically May 18-21, 2015)? From https://meta.wikimedia.org/wiki/Research:Timeline_of_Wikimedia_analytics, it says that pageview data is available in bulk starting on May 1, 2015, so we thought maybe there was some chance this data exists.
Any suggestions would be greatly appreciated, and if this is not possible, please let us know.
Thank you! Jennifer Pan
Hi there,
Although this doesn't answer your specific question, I thought that I'd share that my observations from watching traffic patterns on some Wikimedia pages suggests that the classification of readers into bot, spider, or human has some margin of error, but I don't know what the margin of error is. The margin of error might be worth considering as you analyze the traffic that interests you, especially if you have reason to believe that the margin of error is statistically significant.
Pine ( https://meta.wikimedia.org/wiki/User:Pine )
On Tue, Nov 13, 2018 at 2:41 PM Jennifer Pan jp1@stanford.edu wrote:
Hi there,
I'm an assistant professor in the Department of Communication at Stanford. My co-author, Molly Roberts (Political Science, UCSD), and I are working on a paper examining the effect of China's 2015 block of Chinese language wikipedia on pageviews, which builds on our previous work on censorship in China.
We are using the block to conduct a interrupted time series design to measure the effect of censorship on Chinese users. Our main finding is that Chinese users were using Wikipedia to browse (starting at the home page), and the block influenced users' ability to explore and encounter unexpected information. One question we have is whether the pageviews we observe are driven by bots and spiders. We know that the wikimedia rest api provides this information going back to July 1 2015. Since the China block of Wikipedia was on May 19, 2015, we are wondering if there is pageview data by agent type for zh.wikipedia.org pages (all or some subset like most popular) going back to May 2015 (specifically May 18-21, 2015)? From https://meta.wikimedia.org/wiki/Research:Timeline_of_Wikimedia_analytics, it says that pageview data is available in bulk starting on May 1, 2015, so we thought maybe there was some chance this data exists.
Any suggestions would be greatly appreciated, and if this is not possible, please let us know.
Thank you! Jennifer Pan
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Jennifer,
Is this question related to the topic we're discussing in a private email thread as well? I want to make sure I don't misunderstand your other question, and also that we all don't double/multi-work on the same question. It would be great if you expand if answering that email will address this question or if I'm missing something obvious (apologies in advance if that's the case).
Best, Leila
-- Leila Zia Senior Research Scientist, Lead Wikimedia Foundation
On Tue, Nov 13, 2018 at 6:41 AM Jennifer Pan jp1@stanford.edu wrote:
Hi there,
I'm an assistant professor in the Department of Communication at Stanford. My co-author, Molly Roberts (Political Science, UCSD), and I are working on a paper examining the effect of China's 2015 block of Chinese language wikipedia on pageviews, which builds on our previous work on censorship in China.
We are using the block to conduct a interrupted time series design to measure the effect of censorship on Chinese users. Our main finding is that Chinese users were using Wikipedia to browse (starting at the home page), and the block influenced users' ability to explore and encounter unexpected information. One question we have is whether the pageviews we observe are driven by bots and spiders. We know that the wikimedia rest api provides this information going back to July 1 2015. Since the China block of Wikipedia was on May 19, 2015, we are wondering if there is pageview data by agent type for zh.wikipedia.org pages (all or some subset like most popular) going back to May 2015 (specifically May 18-21, 2015)? From https://meta.wikimedia.org/wiki/Research:Timeline_of_Wikimedia_analytics, it says that pageview data is available in bulk starting on May 1, 2015, so we thought maybe there was some chance this data exists.
Any suggestions would be greatly appreciated, and if this is not possible, please let us know.
Thank you! Jennifer Pan
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hello,
One question we have is whether the pageviews we observe are driven by
bots and spiders. We know that the > wikimedia rest api provides this information going back to July 1 2015. Please have in mind that these are only self-identified bots, there is probably about 1-5% of bot pageview traffic that gets wrongly labeled as "user", a project is on its way to better label this traffic as coming from bots.
On Tue, Nov 13, 2018 at 6:41 AM Jennifer Pan jp1@stanford.edu wrote:
Hi there,
I'm an assistant professor in the Department of Communication at Stanford. My co-author, Molly Roberts (Political Science, UCSD), and I are working on a paper examining the effect of China's 2015 block of Chinese language wikipedia on pageviews, which builds on our previous work on censorship in China.
We are using the block to conduct a interrupted time series design to measure the effect of censorship on Chinese users. Our main finding is that Chinese users were using Wikipedia to browse (starting at the home page), and the block influenced users' ability to explore and encounter unexpected information. One question we have is whether the pageviews we observe are driven by bots and spiders. We know that the wikimedia rest api provides this information going back to July 1 2015. Since the China block of Wikipedia was on May 19, 2015, we are wondering if there is pageview data by agent type for zh.wikipedia.org pages (all or some subset like most popular) going back to May 2015 (specifically May 18-21, 2015)? From https://meta.wikimedia.org/wiki/Research:Timeline_of_Wikimedia_analytics, it says that pageview data is available in bulk starting on May 1, 2015, so we thought maybe there was some chance this data exists.
Any suggestions would be greatly appreciated, and if this is not possible, please let us know.
Thank you! Jennifer Pan
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics