Hi,
On 01/05/2017 18:18, Nuria Ruiz wrote:
are there issues with using the data from the IA?
Since that much predates our team record keeping of data issues the answer is that we do not know. Maybe someone in this list can chip in and we will add this answer to our dataset known issues which can be found here:
https://wikitech.wikimedia.org/wiki/Analytics/Archive/Data/Pagecounts-raw#Ev...
I should add that there are a handful of files in October 2011[1] that are incorrect, as they are not compressed and appear to be HTML pages (also, they are 92KB files instead of being ~85 MB)
Again, the files from Internet Archive[2] seem to be OK.
Cristian
[1] https://dumps.wikimedia.org/other/pagecounts-raw/2011/2011-10/ Specifically, the following: * pagecounts-20111008-180001.gz * pagecounts-20111008-190000.gz * pagecounts-20111008-200000.gz * pagecounts-20111008-210000.gz * pagecounts-20111008-220000.gz [2]: https://archive.org/details/wikipedia_visitor_stats_201110