Dear all,
Pagecounts files from http://dumps.wikimedia.org/other/pagecounts-raw/ used to be downloaded by a script automatically into /mnt/user-store/stats until the end of last year. What happened to that script? The latest pagecounts I can find there are from 20111231. I could write a similar script but I do not think I have permissions to write in /mnt/user-store/stats. Also, I found some pagecounts stored in /mnt/user-store/johang - I think it would be better to store them all in one place so that everyone can use them (unless johang has a good reason to store them in his folder).
Best wishes alkamid
+1.
I too need the new page stats.
I'm also interested in this. As Adam said, it should be no problem to wget the new stats daily from dumps.wikimedia.org, but we should have a writable subdirectory in /mnt/user-store to do this. I don't know if the owner of the current one, schutz, is reading the list... Maybe the best solution is a MMP, as it's been done with the dumps, and that would also help to keep the dir clean of duplicates (not the case currently) and document it on the wiki (as with the dumps, again). Or maybe, to integrate it into the dumps MMP. I offer myself for whatever is needed.
--- Shaurabh Bharti ...
I too need the new page stats.
--- Adam Klimont adamkli@gmail.com ...
Pagecounts files from http://dumps.wikimedia.org/other/pagecounts-raw/ used to be downloaded by a script automatically into /mnt/user-store/stats (...) I do not think I have permissions to write in /mnt/user-store/stats. (...)
** José Emilio Mori Recio - http://es.wikipedia.org/wiki/User:-jem- ** * Administrador Informático del Arzobispado de Valladolid * ** Bibliotecario de Wikipedia en español - Promotor de Wikimedia España **
------------
Español: La información contenida en este e-mail es confidencial y va dirigida únicamente al receptor que aparece como destinatario. Si ha recibido este e-mail por error, por favor, notifíquenoslo inmediatamente y bórrelo de su sistema. Por favor, en tal caso, no lo copie ni lo use para ningún propósito, ni revele sus contenidos a ninguna persona ni lo almacene ni copie esta información en ningún medio.
English: This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
I am reading the list :-)
I had started reorganizing the files earlier this year but did not finish; apart from a few directories leftover, there should no duplicate files. In the meantime, I have restarted a few wget processes in order to get the files.
A MMP is a good idea, and I can look into it. However, the most pressing problem, as I see it, is space, that will become tight very quickly.
And the second thing to do would be to produce daily/monthly/whatever summary files from these raw files, as Erik Zachte does -- there should be no reason to use the raw files.
Frédéric
On 02/14/2012 02:03 PM, José Emilio Mori Recio wrote:
I'm also interested in this. As Adam said, it should be no problem to wget the new stats daily from dumps.wikimedia.org, but we should have a writable subdirectory in /mnt/user-store to do this. I don't know if the owner of the current one, schutz, is reading the list... Maybe the best solution is a MMP, as it's been done with the dumps, and that would also help to keep the dir clean of duplicates (not the case currently) and document it on the wiki (as with the dumps, again). Or maybe, to integrate it into the dumps MMP. I offer myself for whatever is needed.
--- Shaurabh Bharti ...
I too need the new page stats.
--- Adam Klimontadamkli@gmail.com ...
Pagecounts files from http://dumps.wikimedia.org/other/pagecounts-raw/ used to be downloaded by a script automatically into /mnt/user-store/stats (...) I do not think I have permissions to write in /mnt/user-store/stats. (...)
** José Emilio Mori Recio - http://es.wikipedia.org/wiki/User:-jem- **
Administrador Informático del Arzobispado de Valladolid *
** Bibliotecario de Wikipedia en español - Promotor de Wikimedia España **
Español: La información contenida en este e-mail es confidencial y va dirigida únicamente al receptor que aparece como destinatario. Si ha recibido este e-mail por error, por favor, notifíquenoslo inmediatamente y bórrelo de su sistema. Por favor, en tal caso, no lo copie ni lo use para ningún propósito, ni revele sus contenidos a ninguna persona ni lo almacene ni copie esta información en ningún medio.
English: This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Στις 14-02-2012, ημέρα Τρι, και ώρα 22:39 +0100, ο/η Frédéric Schütz έγραψε:
And the second thing to do would be to produce daily/monthly/whatever summary files from these raw files, as Erik Zachte does -- there should be no reason to use the raw files.
WHy not just pull his copies?
Ariel
Frédéric Schütz wrote:
I had started reorganizing the files earlier this year but did not finish; apart from a few directories leftover, there should no duplicate files. In the meantime, I have restarted a few wget processes in order to get the files.
At the moment I'm writing there are still some duplicates for 2011-11 and 2011-12 and for some projectcounts files, but that's not important. What I do want to ask is if you are planning to run, at least, a daily wget for the files, which I think should be necessary for any reliable tool who would make use of them, as I'm planning to do (currently they're stopped at Feb 14, 21 h).
A MMP is a good idea, and I can look into it. However, the most pressing problem, as I see it, is space, that will become tight very quickly.
Currently there are 828 Gb free; hopefully the administration team has enough time to get rid of this. As for the MMP, I offer myself again :).
And the second thing to do would be to produce daily/monthly/whatever summary files from these raw files, as Erik Zachte does -- there should be no reason to use the raw files.
[ >> WHy not just pull his copies? ]
If there are some summary files already, which I don't know, I agree the easiest way would be to also download them. If not, I think someone in here could/should do the task, using a format as plain as possible for any language or script to parse it. Again, it could be me if necessary.
** José Emilio Mori Recio - http://es.wikipedia.org/wiki/User:-jem- ** * Administrador Informático del Arzobispado de Valladolid * ** Bibliotecario de Wikipedia en español - Promotor de Wikimedia España **
------------
Español: La información contenida en este e-mail es confidencial y va dirigida únicamente al receptor que aparece como destinatario. Si ha recibido este e-mail por error, por favor, notifíquenoslo inmediatamente y bórrelo de su sistema. Por favor, en tal caso, no lo copie ni lo use para ningún propósito, ni revele sus contenidos a ninguna persona ni lo almacene ni copie esta información en ningún medio.
English: This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
toolserver-l@lists.wikimedia.org