Hi,
On 01/05/2017 18:18, Nuria Ruiz wrote:
are there
issues with using the data from the IA?
Since that much predates our team record
keeping of data issues the
answer is that we do not know. Maybe someone in this list can chip in
and we will add this answer to our dataset known issues which can be
found here:
https://wikitech.wikimedia.org/wiki/Analytics/Archive/Data/Pagecounts-raw#E…
I should add that there are a handful of files in October 2011[1] that
are incorrect, as they are not compressed and appear to be HTML pages
(also, they are 92KB files instead of being ~85 MB)
Again, the files from Internet Archive[2] seem to be OK.
Cristian
[1]
https://dumps.wikimedia.org/other/pagecounts-raw/2011/2011-10/
Specifically, the following:
* pagecounts-20111008-180001.gz
* pagecounts-20111008-190000.gz
* pagecounts-20111008-200000.gz
* pagecounts-20111008-210000.gz
* pagecounts-20111008-220000.gz
[2]:
https://archive.org/details/wikipedia_visitor_stats_201110