Tomasz Finc wrote:
I've started drafting some new ideas at
http://wikitech.wikimedia.org/view/Data_dump_redesign
of the various problems that were facing and what kind of job management
we can put around it. Were taking this on as a full "should have been
done 2 years ago" project and I'm going to be shepherding this along.
Right now I'm collecting stats about the throughput of the components to
see how much in parallel this could be farmed out in a job management
system.
This is a large project that has some distinct problem areas that we'll
be isolating and welcoming help on.
--tomasz
Quite interesting. Can the images at
office.wikimedia.org be moved to
somewhere public?
Decompression takes as long as compression with bzip2
I think decompression is *faster* than compression
http://tukaani.org/lzma/benchmarks
Let me know if I can help with anything.