On 3/25/09 10:08 AM, Christian Storm wrote:
Thanks to everyone who got the enwiki dumps going again! Should we expect more regular dumps now? What was the final solution of fixing this?
Lots of love and upkeep by everyone :)
But really its needs to be more automated and made parallelised so that we can spot issues faster, validate inconsistencies, and finish quicker.
Brion and I have met about this and we've even brought it into the Wikimedia dev meetings to brainstorm how the system could change for the better.
I've started drafting some new ideas at http://wikitech.wikimedia.org/view/Data_dump_redesign
of the various problems that were facing and what kind of job management we can put around it. Were taking this on as a full "should have been done 2 years ago" project and I'm going to be shepherding this along.
Right now I'm collecting stats about the throughput of the components to see how much in parallel this could be farmed out in a job management system.
This is a large project that has some distinct problem areas that we'll be isolating and welcoming help on.
--tomasz