[Foundation-l] Wikistats is back

Brion Vibber brion at wikimedia.org
Mon Jan 5 20:10:26 UTC 2009


On 12/24/08 3:31 PM, Brian wrote:
> I am still quite shocked at the amount of time the english wikipedia takes
> to dump, especially since we seem to have close links to folks who work at
> mysql. To me it seems that one of two things must be the case:
>
> 1. Wikipedia has outgrown mysql, in the sense that, while we can put data
> in, we cannot get it all back out.
> 2. Despite aggressive hardware purchases over the years, the correct
> hardware has still not been purchased.
>
> I wonder which of these is the case. Presumably #2 ?

3. The current data dump process doesn't scale to en.wikipedia's current 
size, and is being retooled to run in parallel to handle the case 
better. When this is complete, it'll be announced.

It's not a mysql issue -- the issue is in pulling out all the raw 
compressed data, decompressing it, ordering it, and recompressing it 
into something small enough for people to download and make use of.

-- brion



More information about the foundation-l mailing list