2009/9/18 Erik Zachte erikzachte@infodisiac.com:
I think it is extremely important to keep these files for later analysis by historians and others.
Mathias Schindler also keep an archive or at least did till April (Berlin conference). He even bought a dedicated external drive for it.
Right now, I have a single copy of all the files from December 2007 to April 2009 on a single hard drive. I haven't done any integrity checks beyond some initial tests. The dataset has some missing spots when the service to produce the files was not working. In some cases, it is just an empty .gz file, in some cases there was no file produced at all.
In my spare time, I will try to load the files from May to now to this hard drive until it is full.
The situation is rather uncomfortable for me since I am in no way able to guarantee the integrity and safety of these files for a longer time frame. While I might continue downloading and "storing" the files, I would be extremely happy to hear that the full and unabridged set of files is available a) to anyone b) for an indefinite time span c) free of charge d) with some backup and data integrity check in place.
Speaking of wish lists, a web-accessible service to work with the data would be nice. We know for sure that journalists and hopefully some more demographics like the data, numbers and resulting shiny graphs.
Mathias
Can you upload it to some public datahoster like rapidshare.com or seed it via Bittorrent? This way people can make backups of them easily. Marco
On Sun, Sep 20, 2009 at 5:06 PM, Mathias Schindler < mathias.schindler@wikimedia.de> wrote:
2009/9/18 Erik Zachte erikzachte@infodisiac.com:
I think it is extremely important to keep these files for later analysis
by
historians and others.
Mathias Schindler also keep an archive or at least did till April (Berlin conference). He even bought a dedicated external drive for it.
Right now, I have a single copy of all the files from December 2007 to April 2009 on a single hard drive. I haven't done any integrity checks beyond some initial tests. The dataset has some missing spots when the service to produce the files was not working. In some cases, it is just an empty .gz file, in some cases there was no file produced at all.
In my spare time, I will try to load the files from May to now to this hard drive until it is full.
The situation is rather uncomfortable for me since I am in no way able to guarantee the integrity and safety of these files for a longer time frame. While I might continue downloading and "storing" the files, I would be extremely happy to hear that the full and unabridged set of files is available a) to anyone b) for an indefinite time span c) free of charge d) with some backup and data integrity check in place.
Speaking of wish lists, a web-accessible service to work with the data would be nice. We know for sure that journalists and hopefully some more demographics like the data, numbers and resulting shiny graphs.
Mathias
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
toolserver-l@lists.wikimedia.org