Hi, I'm Yuvi, a student looking forward to working with MediaWiki via
this year's GSoC.
I want to work on something dump related, and have been bugging
apergos (Ariel) for a while now. One of the things that popped up into
my head is moving the dump process to another language (say, C#, or
Java, or be very macho and do C++ or C). This would give the dump
process quite a bit of a speed bump (The profiling I did[1] seems to
indicate that the DB is not the bottleneck. Might be wrong though),
and can also be done in a way that makes running distributed dumps
easier/more elegant.
So, thoughts on this? Is 'Move Dumping Process to another language' a
good idea at all?
P.S. I'm just looking out for ideas, so if you have specific
improvements to the dumping process in mind, please respond with those
too. I already have DistributedBZip2 and Incremental Dumps in mind too
:)
[1]:
https://bugzilla.wikimedia.org/show_bug.cgi?id=5303
Thanks :)
--
Yuvi Panda T
http://yuvi.in/