Are visitor stats (as produced by Domas) safely archived somewhere, for example on the toolserver, where development projects can easily access them for analysis? I have made my own copies of the files (I guess my plan was to use them, but this hasn't started yet), but now I'm running out of disk and I urgently need to clear some space on that server.
I just deleted September 2009 (last 2 weeks) and that freed 9 GB.
The oldest I have is pagecounts-20071209-180000.gz
Lars Aronsson wrote:
Are visitor stats (as produced by Domas) safely archived somewhere, for example on the toolserver, where development projects can easily access them for analysis? I have made my own copies of the files (I guess my plan was to use them, but this hasn't started yet), but now I'm running out of disk and I urgently need to clear some space on that server.
I just deleted September 2009 (last 2 weeks) and that freed 9 GB.
The oldest I have is pagecounts-20071209-180000.gz
They should be at /mnt/user-store/stats At least from 1 October 2008.
Lars Aronsson wrote:
Are visitor stats (as produced by Domas) safely archived somewhere, for example on the toolserver, where development projects can easily access them for analysis? I have made my own copies of the files (I guess my plan was to use them, but this hasn't started yet), but now I'm running out of disk and I urgently need to clear some space on that server.
I just deleted September 2009 (last 2 weeks) and that freed 9 GB.
The oldest I have is pagecounts-20071209-180000.gz
As Platonides mentioned, they are in /mnt/user-store/stats on the toolserver; however, I would not call that "safely archived": one of my cron jobs just copies them from Domas server, and that's it.
At the moment, there should be everything starting from 1 January 2009 (although part of it disappeared at some point, but I managed to recover it).
However, this is definitively not a sustainable solution in the long run: the files currently take 335 Gb (out of a 1.5 Tb total space).
Erik Zachte stores archives of visitor stats in a better format, aggregating some of the older data and storing several days of data in one file. I started looking into these files earlier this year, planning to spend some time playing with this data. One of my ideas was to replicate the statistical data that is on the WMF stats server somewhere on the toolserver -- and do it "officially" and not just by copying files using a personal cron job. Unfortunately, "real life" took over and I did not manage to continue this (and still can't). However, if there is any interest in improving the situation, I'd be glad to look into it as soon as I can.
I cc' Erik who may have more to say.
Cheers,
Frédéric
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Frédéric Schütz:
At the moment, there should be everything starting from 1 January 2009 (although part of it disappeared at some point, but I managed to recover it).
i probably deleted it for using too much disk space. if you really need an entire year's worth of stats, you need to find a way to aggregate it so it uses less disk space.
- river.
Earlier, I wrote:
Are visitor stats (as produced by Domas) safely archived somewhere...?
As an experiment, I uploaded the files for December 2007 to the Internet Archive, http://www.archive.org/details/wikipedia_visitor_stats_200712
It was the first time I uploaded something to IA, and since this was not sound or movies, it was put under "opensource books". Even though I have a 100 Mbit/s connection, the FTP upload only got 2.5 Mbit/s (317 kB/s) and the entire upload took 12 hours.
Even though the pagecounts files (each covering one hour) are compressed, each one contains the same dictionary (article titles) and I think the total could be more efficiently compressed (without loss of any information) if they were unpacked and organized differently. I don't really have the time and energy to investigate this.
Now I would feel less frustrated if these are removed from my disk.
Should I continue to do this for the files for 2008, one batch per month? Or do you have any better ideas?
toolserver-l@lists.wikimedia.org