On Wed, Feb 25, 2009 at 3:58 AM, Brion Vibber brion@wikimedia.org wrote:
Quick update on dump status:
- Dumps are back up and running on srv31, the old dump batch host.
Please note that unlike the wikis sites themselves, dump activity is *not* considered time-critical -- there is no emergency requirement to get them running as soon as possible.
Getting dumps running again after a few days is nearly as good as getting them running again immediately. Yes, it sucks when it takes longer than we'd like. No, it's not the end of the world.
- Dump runner redesign is in progress.
I've chatted a bit with Tim in the past on rearranging the architecture of the dump system to allow for horizontal scaling, which will make the big history dumps much much faster by distributing the work across multiple CPUs or hosts where it's currently limited to a single thread per wiki.
We seem to be in agreement on the basic arch, and Tomasz is now in charge of making this happen; he'll be poking at infrastructure for this over the next few days -- using his past experience with distributed index build systems at Amazon to guide his research -- and will report to y'all later this week with some more concrete details.
Has the dumper been tweaked to remove all hidden revisions, including hidden usernames recently fixed in bug 17792?
-- John Vandenberg