Frédéric Schütz wrote:
I had started reorganizing the files earlier this year
but did not
finish; apart from a few directories leftover, there should no duplicate
files. In the meantime, I have restarted a few wget processes in order
to get the files.
At the moment I'm writing there are still some duplicates for 2011-11 and
2011-12 and for some projectcounts files, but that's not important. What I
do want to ask is if you are planning to run, at least, a daily wget for
the files, which I think should be necessary for any reliable tool who
would make use of them, as I'm planning to do (currently they're stopped
at Feb 14, 21 h).
A MMP is a good idea, and I can look into it. However,
the most pressing
problem, as I see it, is space, that will become tight very quickly.
Currently there are 828 Gb free; hopefully the administration team has
enough time to get rid of this. As for the MMP, I offer myself again :).
And the second thing to do would be to produce
daily/monthly/whatever
summary files from these raw files, as Erik Zachte does -- there should
be no reason to use the raw files.
[ >> WHy not just pull his copies? ]
If there are some summary files already, which I don't know, I agree the
easiest way would be to also download them. If not, I think someone in
here could/should do the task, using a format as plain as possible for any
language or script to parse it. Again, it could be me if necessary.
** José Emilio Mori Recio -
http://es.wikipedia.org/wiki/User:-jem- **
* Administrador Informático del Arzobispado de Valladolid *
** Bibliotecario de Wikipedia en español - Promotor de Wikimedia España **
------------
Español:
La información contenida en este e-mail es confidencial y va dirigida
únicamente al receptor que aparece como destinatario. Si ha recibido
este e-mail por error, por favor, notifíquenoslo inmediatamente y
bórrelo de su sistema. Por favor, en tal caso, no lo copie ni lo use
para ningún propósito, ni revele sus contenidos a ninguna persona ni lo
almacene ni copie esta información en ningún medio.
English:
This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.