On 11/18/2010 03:59 PM, emijrp wrote:
How can we know if there are more corrupted files of old pageviews?
Since the files are gzipped, you can run
gzip -t
to test their integrity. I do it on my own archive; the script is on the toolserver but is not used regularly.
I just launched the script on all files since January 2010. It takes quite a bit of time to run since it also calculates SHA1 fingerprints for all files (this does not directly help finding out corruption, but it allows me to compare the results of archiving on different servers).
Frédéric