Hi, I'm Yuvi, a student looking forward to working with MediaWiki via this year's GSoC.
I want to work on something dump related, and have been bugging apergos (Ariel) for a while now. One of the things that popped up into my head is moving the dump process to another language (say, C#, or Java, or be very macho and do C++ or C). This would give the dump process quite a bit of a speed bump (The profiling I did[1] seems to indicate that the DB is not the bottleneck. Might be wrong though), and can also be done in a way that makes running distributed dumps easier/more elegant.
So, thoughts on this? Is 'Move Dumping Process to another language' a good idea at all?
P.S. I'm just looking out for ideas, so if you have specific improvements to the dumping process in mind, please respond with those too. I already have DistributedBZip2 and Incremental Dumps in mind too :)
[1]: https://bugzilla.wikimedia.org/show_bug.cgi?id=5303
Thanks :)