2009/9/17 Erik Zachte erikzachte@infodisiac.com:
I think it is extremely important to keep these files for later analysis by historians and others.
Mathias Schindler also keep an archive or at least did till April (Berlin conference). He even bought a dedicated external drive for it.
I collect files daily and merge 24 hourly files into one daily file. That saves a lot on disk space and makes processing faster. Titles with less than 10 requests per day are discarded that also saves a lot.
Careful, a recent analysis I did suggested that 15% of all page requests for articles on Wikipedia are for topics requested less than once per hour. There are a very large number of pages that rarely see hits, but collectively the traffic to such topics is important. You could end up biasing certain kinds of analysis if you always exclude the rarely visited pages.
-Robert Rohde