Hi Bruno,

Actually I'm not going to answer your question and leave it for others who have developed tools to parse the pagecount files, but while we're on the topic just wanted to point out the "redirects" and title changes. This is something that a good number of people who work with the viewership data overlook. If the title of a page is changed, the history of the page will be moved under the new title and the old title will become a redirect page (normally). But the viewership data will be split. So if you want to, for example, now the viewership of a page with current title B and old title A, you have to add up the viewership to both pages within the period under study. Just something to note... and sorry if you're already doing this!

Good luck,

On Thu, Jul 28, 2016 at 9:00 PM, Bruno Goncalves <bgoncalves@gmail.com> wrote:

I've been trying to match edit activity with pagecounts but I've encountered a couple of problems. The amazing pagecounts dumps (https://dumps.wikimedia.org/other/pagecounts-raw/) use the page url to identify the individual page:

      fr.b Special:Recherche/Achille_Baraguey_d%5C%27Hilliers 1 624
while the stub-meta-history uses the "raw" title:

    <title>Wikipedia:Community Portal</title>

so I need an easy way to map title to url. I imagine there some rules on how this "translation" is done? My google-fu has failed to encounter them.

Also, are is timezones mentioned in the meta-history files:


the same as the one used in the pagecount filenames:




Bruno Miguel Tavares Gonçalves, PhD
Homepage: www.bgoncalves.com
Email: bgoncalves@gmail.com

Wiki-research-l mailing list


==New Paper==
Taha Yasseri and Jonathan Bright
EPJ Data Science, 5:22 (2016).

Dr Taha Yasseri
Research Fellow in Computational Social Science, Oxford Internet Institute,
Research Fellow in Humanities and Social Sciences, Wolfson College,
University of Oxford,
Faculty Fellow, Alan Turing Institute for Data Science. 

Tel. +44-1865-287229
1 St. Giles
Oxford OX1 3JS