2009/9/17 Erik Zachte <erikzachte(a)infodisiac.com>om>:
I think it is extremely important to keep these files
for later analysis by
historians and others.
Mathias Schindler also keep an archive or at least did till April (Berlin
conference).
He even bought a dedicated external drive for it.
I collect files daily and merge 24 hourly files into one daily file.
That saves a lot on disk space and makes processing faster.
Titles with less than 10 requests per day are discarded that also saves a
lot.
Careful, a recent analysis I did suggested that 15% of all page
requests for articles on Wikipedia are for topics requested less than
once per hour. There are a very large number of pages that rarely see
hits, but collectively the traffic to such topics is important. You
could end up biasing certain kinds of analysis if you always exclude
the rarely visited pages.
-Robert Rohde