Hi,
On 01/05/2017 18:18, Nuria Ruiz wrote:
>> are there issues with using the data from the IA?
> Since that much predates our team record keeping of data issues the
> answer is that we do not know. Maybe someone in this list can chip in
> and we will add this answer to our dataset known issues which can be
> found here:
>
> https://wikitech.wikimedia.org/wiki/Analytics/Archive/ Data/Pagecounts-raw#Events_ and_known_problems_since_2014- 03-01
I should add that there are a handful of files in October 2011[1] that
are incorrect, as they are not compressed and appear to be HTML pages
(also, they are 92KB files instead of being ~85 MB)
Again, the files from Internet Archive[2] seem to be OK.
Cristian
[1] https://dumps.wikimedia.org/other/pagecounts-raw/2011/ 2011-10/
Specifically, the following:
* pagecounts-20111008-180001.gz
* pagecounts-20111008-190000.gz
* pagecounts-20111008-200000.gz
* pagecounts-20111008-210000.gz
* pagecounts-20111008-220000.gz
[2]: https://archive.org/details/wikipedia_visitor_stats_201110
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics