I see that the latest dump of the English Wikipedia failed (I mean, the dump of all the page histories). As part of some other work I am doing, I have efficient code that can "take apart" a dump into its single component pages, and out of that, it would be possible to fashion code that "stitches together" various partial dumps. This would allow to break up a single dump process into multiple, shorter processes, in which for example one only dumps one month worth, or one week worth, of revisions to the English wikipedia. Breaking up the dump process will increase the probability that each of the smaller dumps succeeds. For instance, one could have all the partial dumps, the launch the stitching process, and the stitching process produces a single dump, removing duplicate revisions.
At UCSC, where I work, there are various Master students looking for projects... and some may be interested in doing work that is concretely useful to the Wikipedia. Should I try to get them interested in writing a proper dump stitching tool, and some code to do partial dumps?
Can Brion or Tim give us more detail on why the dumps are failing? Are they already doing partial dumps? Is there already a dump stitching tool? Is there anything that could be done to help the process? I could help by looking for database students in search of a project and giving them my code as a starting point...
Best,
Luca