Hi Christian,
Also do not recall what happened, but maybe those files were removed purposefully because of data being corrupt or other issues.
Cheers!
On Fri, May 5, 2017 at 9:36 AM, Cristian Consonni cristian@balist.es wrote:
Hi,
On 01/05/2017 18:18, Nuria Ruiz wrote:
are there issues with using the data from the IA?
Since that much predates our team record keeping of data issues the answer is that we do not know. Maybe someone in this list can chip in and we will add this answer to our dataset known issues which can be found here:
Data/Pagecounts-raw#Events_and_known_problems_since_2014-03-01
I should add that there are a handful of files in October 2011[1] that are incorrect, as they are not compressed and appear to be HTML pages (also, they are 92KB files instead of being ~85 MB)
Again, the files from Internet Archive[2] seem to be OK.
Cristian
[1] https://dumps.wikimedia.org/other/pagecounts-raw/2011/2011-10/ Specifically, the following:
- pagecounts-20111008-180001.gz
- pagecounts-20111008-190000.gz
- pagecounts-20111008-200000.gz
- pagecounts-20111008-210000.gz
- pagecounts-20111008-220000.gz
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics