[Foundation-l] Wikistats is back
Brion Vibber
brion at wikimedia.org
Mon Jan 5 20:10:26 UTC 2009
On 12/24/08 3:31 PM, Brian wrote:
> I am still quite shocked at the amount of time the english wikipedia takes
> to dump, especially since we seem to have close links to folks who work at
> mysql. To me it seems that one of two things must be the case:
>
> 1. Wikipedia has outgrown mysql, in the sense that, while we can put data
> in, we cannot get it all back out.
> 2. Despite aggressive hardware purchases over the years, the correct
> hardware has still not been purchased.
>
> I wonder which of these is the case. Presumably #2 ?
3. The current data dump process doesn't scale to en.wikipedia's current
size, and is being retooled to run in parallel to handle the case
better. When this is complete, it'll be announced.
It's not a mysql issue -- the issue is in pulling out all the raw
compressed data, decompressing it, ordering it, and recompressing it
into something small enough for people to download and make use of.
-- brion
More information about the foundation-l
mailing list