Thanks Taha for pointing that out. I have added the note to the blog post and am hope to start a conversation on what can we do to make the analysis and curation of Wikipedia traffic data more useful and meaningful both for research and policies. 

BTW, very interesting phenomena of "sleep depth" for different languages in Taha Yasseri, Robert Sumi, János Kertész's paper. It provides insights into the time distribution of Wikipedia labour across working hours and working days. To certain extent, it shows us the current utility of the global "cognitive surplus" by the Wikipedia projects. Virtual labour is still conditioned by the diverse working environments across the world, as mentioned by the authors in the quote: below:

"For example, the daily pattern of Asian languages (e.g., Japanese, Chinese and Korean) show higher activity during evenings and nights along with high level of activity at weekends. This can be related partly to the lengths of working hours in corresponding countries. This general image, which holds partially for Turkey and Russia and Israel too, could be in close relation with the high average working hours per day in those countries (more than 40 hours in all the mentioned cases, according to the dataset of The Organization for Economic Co-operation and Development:http://stats.oecd.org). Furthermore, among European countries, we also see the same tendency; in the countries with rather larger working times, edits are mostly done in later times in evenings."

Note also that the difference in the timeframes, what I have done by the infographics based on the Wikimedia's Squid reports (not the original traffic data) shows yearly changes.  This is in contrast to the Yasseri et al.'s analysis of circadian *daily* and *weekly* patterns. Both have different angles and thus different needs from the Wikimedia Foundation for its traffic data. 

Thus, we might want to share what has been done and what could be done regarding the current traffic data provided by the Wikimedia Foundation while acknowledging the sensitivity of the traffic data release, 

Best,
han-teng liao



2014-05-16 23:50 GMT+08:00 Taha Yasseri <taha.yaseri@gmail.com>:
Very useful Han-Teng, but one should note that the original data is about "the percentage of requesting ip addresses", excluding duplications of a single IP address within the same day,  and not for example the number of edits. These two can be very different depending on dynamic/static IP address models in different countries. 
And that explains the discrepancy between your results and our earlier analysis based on circadian patterns and edits timestamps. 

Again, very interesting and well done.
Best,
Taha


On Fri, May 16, 2014 at 4:28 PM, h <hanteng@gmail.com> wrote:
Dear all, 

With the aim to compare Wikipedia traffic report data (e.g. viewing versus editing, regional differences within a language version, etc.), I have made a few more interactive infographics which show the historical changes since late 2011. (Historical numbers are scraped from the past versions archived by the Internet archive)

For more, please visit follow the link below:

It has at least one nice interactive feature: a user can zoom and pan to view the chart easily with a mouse or mousepad. The SVG vector-based presentation insures the picture quality is consistent when users zoom in to compare data points.  (I haven't figured out how mpld3's html tooltip work for this project, though.)

    It is also possible to extend the prototype with dynamic json objects so that the chart/tables can be updated automatically.  
     
    Any suggestions and comments are welcome.

Best,
han-teng liao


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




--
.t

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l