Quick update on dump status:
* Dumps are back up and running on srv31, the old dump batch host.
Please note that unlike the wikis sites themselves, dump activity is
*not* considered time-critical -- there is no emergency requirement to
get them running as soon as possible.
Getting dumps running again after a few days is nearly as good as
getting them running again immediately. Yes, it sucks when it takes
longer than we'd like. No, it's not the end of the world.
* Dump runner redesign is in progress.
I've chatted a bit with Tim in the past on rearranging the architecture
of the dump system to allow for horizontal scaling, which will make the
big history dumps much much faster by distributing the work across
multiple CPUs or hosts where it's currently limited to a single thread
per wiki.
We seem to be in agreement on the basic arch, and Tomasz is now in
charge of making this happen; he'll be poking at infrastructure for this
over the next few days -- using his past experience with distributed
index build systems at Amazon to guide his research -- and will report
to y'all later this week with some more concrete details.
* Dump format changes are in progress.
Robert Rohde's p.o.c code for diff-based dumps is in our SVN and
available for testing.
We'll be looking at what the possibility on integrating this is to see
what the effect on dump performance is; currently performance and
reliability are our primary concerns, rather than output file size, but
they can intersect since the bzip2 data compression is a time factor.
This will be pushed back to later if we don't see an immediate
generation-speed improvement, but it's very much a desired project since
it will make the full-history dump files much smaller.
-- brion