On Wed, Jul 3, 2013 at 7:49 AM, Erik Zachte ezachte@wikimedia.org wrote:
it will now be a command line application that outputs the data as
uncompressed XML, in the same format as current dumps.
That will help a great deal. But I assume your application will be for Linux only? So it would help to still generate current compressed dumps, as post processing step, and store them online for download.
One of the reasons of xml dumps is platform independence, both from producer side (we had ever evolving SQL dumps earlier), and consumer side (not everyone uses Linux).
Platform independence is one of the main reasons I wrote the old 'mwdumper' processing tool in Java. Something in C/C++ should be able to be portable as well, though, it mainly means a slightly more complicated build system... and then we actually have to distribute binaries if we're not user-hostile. :)
-- brion