Are visitor stats (as produced by Domas) safely archived somewhere, for example on the toolserver, where development projects can easily access them for analysis? I have made my own copies of the files (I guess my plan was to use them, but this hasn't started yet), but now I'm running out of disk and I urgently need to clear some space on that server.
I just deleted September 2009 (last 2 weeks) and that freed 9 GB.
The oldest I have is pagecounts-20071209-180000.gz
Earlier, I wrote:
Are visitor stats (as produced by Domas) safely archived somewhere...?
As an experiment, I uploaded the files for December 2007 to the Internet Archive, http://www.archive.org/details/wikipedia_visitor_stats_200712
It was the first time I uploaded something to IA, and since this was not sound or movies, it was put under "opensource books". Even though I have a 100 Mbit/s connection, the FTP upload only got 2.5 Mbit/s (317 kB/s) and the entire upload took 12 hours.
Even though the pagecounts files (each covering one hour) are compressed, each one contains the same dictionary (article titles) and I think the total could be more efficiently compressed (without loss of any information) if they were unpacked and organized differently. I don't really have the time and energy to investigate this.
Now I would feel less frustrated if these are removed from my disk.
Should I continue to do this for the files for 2008, one batch per month? Or do you have any better ideas?
wikitech-l@lists.wikimedia.org