New wikistats reports have been published today, for the first time since May 2008. The reports have been generated on the new wikistats server Bayes, which is operational since a few weeks. The dump process itself had been restarted some weeks earlier, new dumps are now available for all 700+ wiki projects (with the English Wikipedia as the usual exception). From now on the wikistats reports will be updated much more frequently. The actual processing of any new dump starts soon after the dump becomes available, results will be stored in intermediate files. Once a week updated reports will be published.
Much more on this at http://infodisiac.com/blog/2008/12/wikistats-is-back/
Happy holidays everyone.
Erik Zachte
Thank you Erik!
For the "Page Views" data on some projects, the May data looks unusually lower than the June data; could it be that the May data isnt a complete month for some projects?
http://stats.wikimedia.org/wikisource/EN/TablesPageViewsMonthly.htm http://stats.wikimedia.org/wikiquote/EN/TablesPageViewsMonthly.htm http://stats.wikimedia.org/wikinews/EN/TablesPageViewsMonthly.htm http://stats.wikimedia.org/wikibooks/EN/TablesPageViewsMonthly.htm
-- John
On 12/25/08, Erik Zachte erikzachte@infodisiac.com wrote:
New wikistats reports have been published today, for the first time since May 2008. The reports have been generated on the new wikistats server 'Bayes', which is operational since a few weeks. The dump process itself had been restarted some weeks earlier, new dumps are now available for all 700+ wiki projects (with the English Wikipedia as the usual exception). From now on the wikistats reports will be updated much more frequently. The actual processing of any new dump starts soon after the dump becomes available, results will be stored in intermediate files. Once a week updated reports will be published.
Much more on this at http://infodisiac.com/blog/2008/12/wikistats-is-back/
Happy holidays everyone.
Erik Zachte
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Nice work Erik!
I am still quite shocked at the amount of time the english wikipedia takes to dump, especially since we seem to have close links to folks who work at mysql. To me it seems that one of two things must be the case:
1. Wikipedia has outgrown mysql, in the sense that, while we can put data in, we cannot get it all back out. 2. Despite aggressive hardware purchases over the years, the correct hardware has still not been purchased.
I wonder which of these is the case. Presumably #2 ?
Cheers, Brian
On Wed, Dec 24, 2008 at 3:50 PM, Erik Zachte erikzachte@infodisiac.comwrote:
New wikistats reports have been published today, for the first time since May 2008. The reports have been generated on the new wikistats server 'Bayes', which is operational since a few weeks. The dump process itself had been restarted some weeks earlier, new dumps are now available for all 700+ wiki projects (with the English Wikipedia as the usual exception). From now on the wikistats reports will be updated much more frequently. The actual processing of any new dump starts soon after the dump becomes available, results will be stored in intermediate files. Once a week updated reports will be published.
Much more on this at http://infodisiac.com/blog/2008/12/wikistats-is-back/
Happy holidays everyone.
Erik Zachte
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Hoi, It is not one either. It has been said repeatedly that the process of a straightforward back up is something that is done on a regular basis. This however includes a lot of information that we do not allow to be included in the data export that is made available to the public. So never mind what database is used, special purpose software is needed to provide the functionality needed.
This functionality needs more redesign and programming. It is a process that impacts the usability of the English language Wikipedia and as such may benefit from the Stanton gift.. then again it does not impact the usability of people new to Wikipedia. Thanks, GerardM
2008/12/25 Brian Brian.Mingus@colorado.edu
Nice work Erik!
I am still quite shocked at the amount of time the english wikipedia takes to dump, especially since we seem to have close links to folks who work at mysql. To me it seems that one of two things must be the case:
- Wikipedia has outgrown mysql, in the sense that, while we can put data
in, we cannot get it all back out. 2. Despite aggressive hardware purchases over the years, the correct hardware has still not been purchased.
I wonder which of these is the case. Presumably #2 ?
Cheers, Brian
On Wed, Dec 24, 2008 at 3:50 PM, Erik Zachte <erikzachte@infodisiac.com
wrote:
New wikistats reports have been published today, for the first time since May 2008. The reports have been generated on the new wikistats server 'Bayes', which is operational since a few weeks. The dump process itself had been restarted some weeks earlier, new dumps are now available for all
700+
wiki projects (with the English Wikipedia as the usual exception). From
now
on the wikistats reports will be updated much more frequently. The actual processing of any new dump starts soon after the dump becomes available, results will be stored in intermediate files. Once a week updated reports will be published.
Much more on this at
http://infodisiac.com/blog/2008/12/wikistats-is-back/
Happy holidays everyone.
Erik Zachte
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
-- (Not sent from my iPhone) _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On 12/25/08, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, It is not one either. It has been said repeatedly that the process of a straightforward back up is something that is done on a regular basis. This however includes a lot of information that we do not allow to be included in the data export that is made available to the public. So never mind what database is used, special purpose software is needed to provide the functionality needed.
This functionality needs more redesign and programming. It is a process that impacts the usability of the English language Wikipedia and as such may benefit from the Stanton gift.. then again it does not impact the usability of people new to Wikipedia. Thanks, GerardM
2008/12/25 Brian Brian.Mingus@colorado.edu
Nice work Erik!
I am still quite shocked at the amount of time the english wikipedia takes to dump, especially since we seem to have close links to folks who work at mysql. To me it seems that one of two things must be the case:
- Wikipedia has outgrown mysql, in the sense that, while we can put data
in, we cannot get it all back out. 2. Despite aggressive hardware purchases over the years, the correct hardware has still not been purchased.
I wonder which of these is the case. Presumably #2 ?
Cheers, Brian
On Wed, Dec 24, 2008 at 3:50 PM, Erik Zachte <erikzachte@infodisiac.com
wrote:
New wikistats reports have been published today, for the first time since May 2008. The reports have been generated on the new wikistats server 'Bayes', which is operational since a few weeks. The dump process itself had been restarted some weeks earlier, new dumps are now available for all
700+
wiki projects (with the English Wikipedia as the usual exception). From
now
on the wikistats reports will be updated much more frequently. The actual processing of any new dump starts soon after the dump becomes available, results will be stored in intermediate files. Once a week updated reports will be published.
Much more on this at
http://infodisiac.com/blog/2008/12/wikistats-is-back/
Happy holidays everyone.
Erik Zachte
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
-- (Not sent from my iPhone) _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
2008/12/25 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, It is not one either. It has been said repeatedly that the process of a straightforward back up is something that is done on a regular basis.
No it hasn't
Hoi, On that note ... http://hardware.slashdot.org/article.pl?sid=09/01/02/1546214 Thanks, GerardM
2009/1/1 geni geniice@gmail.com
2008/12/25 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, It is not one either. It has been said repeatedly that the process of a straightforward back up is something that is done on a regular basis.
No it hasn't
-- geni
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On 12/24/08 3:31 PM, Brian wrote:
I am still quite shocked at the amount of time the english wikipedia takes to dump, especially since we seem to have close links to folks who work at mysql. To me it seems that one of two things must be the case:
- Wikipedia has outgrown mysql, in the sense that, while we can put data
in, we cannot get it all back out. 2. Despite aggressive hardware purchases over the years, the correct hardware has still not been purchased.
I wonder which of these is the case. Presumably #2 ?
3. The current data dump process doesn't scale to en.wikipedia's current size, and is being retooled to run in parallel to handle the case better. When this is complete, it'll be announced.
It's not a mysql issue -- the issue is in pulling out all the raw compressed data, decompressing it, ordering it, and recompressing it into something small enough for people to download and make use of.
-- brion
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Thank you Erik!
Erik Zachte wrote:
New wikistats reports have been published today, for the first time since May 2008. The reports have been generated on the new wikistats server ‘Bayes’, which is operational since a few weeks. The dump process
itself had
been restarted some weeks earlier, new dumps are now available for all 700+ wiki projects (with the English Wikipedia as the usual exception). From now on the wikistats reports will be updated much more frequently. The actual processing of any new dump starts soon after the dump becomes available, results will be stored in intermediate files. Once a week updated reports will be published.
Much more on this at http://infodisiac.com/blog/2008/12/wikistats-is-back/
Happy holidays everyone.
Erik Zachte
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Woo!! Thank you belatedly for my new years' dose of infodisiac. --SJ
On Wed, Dec 24, 2008 at 5:50 PM, Erik Zachte erikzachte@infodisiac.com wrote:
New wikistats reports have been published today, for the first time since May 2008. The reports have been generated on the new wikistats server 'Bayes', which is operational since a few weeks. The dump process itself had been restarted some weeks earlier, new dumps are now available for all 700+ wiki projects (with the English Wikipedia as the usual exception). From now on the wikistats reports will be updated much more frequently. The actual processing of any new dump starts soon after the dump becomes available, results will be stored in intermediate files. Once a week updated reports will be published.
Much more on this at http://infodisiac.com/blog/2008/12/wikistats-is-back/
Happy holidays everyone.
Erik Zachte
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
wikimedia-l@lists.wikimedia.org