Frédéric Schütz wrote:
I had started reorganizing the files earlier this year but did not finish; apart from a few directories leftover, there should no duplicate files. In the meantime, I have restarted a few wget processes in order to get the files.
At the moment I'm writing there are still some duplicates for 2011-11 and 2011-12 and for some projectcounts files, but that's not important. What I do want to ask is if you are planning to run, at least, a daily wget for the files, which I think should be necessary for any reliable tool who would make use of them, as I'm planning to do (currently they're stopped at Feb 14, 21 h).
A MMP is a good idea, and I can look into it. However, the most pressing problem, as I see it, is space, that will become tight very quickly.
Currently there are 828 Gb free; hopefully the administration team has enough time to get rid of this. As for the MMP, I offer myself again :).
And the second thing to do would be to produce daily/monthly/whatever summary files from these raw files, as Erik Zachte does -- there should be no reason to use the raw files.
[ >> WHy not just pull his copies? ]
If there are some summary files already, which I don't know, I agree the easiest way would be to also download them. If not, I think someone in here could/should do the task, using a format as plain as possible for any language or script to parse it. Again, it could be me if necessary.
** José Emilio Mori Recio - http://es.wikipedia.org/wiki/User:-jem- ** * Administrador Informático del Arzobispado de Valladolid * ** Bibliotecario de Wikipedia en español - Promotor de Wikimedia España **
------------
Español: La información contenida en este e-mail es confidencial y va dirigida únicamente al receptor que aparece como destinatario. Si ha recibido este e-mail por error, por favor, notifíquenoslo inmediatamente y bórrelo de su sistema. Por favor, en tal caso, no lo copie ni lo use para ningún propósito, ni revele sus contenidos a ninguna persona ni lo almacene ni copie esta información en ningún medio.
English: This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.