[Toolserver-l] Archive of visitor stats

Mathias Schindler mathias.schindler at wikimedia.de
Sun Sep 20 15:06:18 UTC 2009


2009/9/18 Erik Zachte <erikzachte at infodisiac.com>:
> I think it is extremely important to keep these files for later analysis by
> historians and others.
>
> Mathias Schindler also keep an archive or at least did till April (Berlin
> conference).
> He even bought a dedicated external drive for it.

Right now, I have a single copy of all the files from December 2007 to
April 2009 on a single hard drive. I haven't done any integrity checks
beyond some initial tests. The dataset has some missing spots when the
service to produce the files was not working. In some cases, it is
just an empty .gz file, in some cases there was no file produced at
all.

In my spare time, I will try to load the files from May to now to this
hard drive until it is full.

The situation is rather uncomfortable for me since I am in no way able
to guarantee the integrity and safety of these files for a longer time
frame. While I might continue downloading and "storing" the files, I
would be extremely happy to hear that the full and unabridged set of
files is available a) to anyone b) for an indefinite time span c) free
of charge d) with some backup and data integrity check in place.

Speaking of wish lists, a web-accessible service to work with the data
would be nice. We know for sure that journalists and hopefully some
more demographics like the data, numbers and resulting shiny graphs.

Mathias



More information about the Toolserver-l mailing list