Hi,

after a month of work on my GSoC project Incremental Dumps [1], I think I have now something worth sharing and talking about, though it's still far from complete.

What the code can do now is to read a pages-history XML dump and create the various kinds of dumps (pages/stub, current/history) in the new format from that.
It can then convert a dump in the new format back to XML.

The XML output is almost the same as existing XML dumps, but there are some differences [2].
The current state of the new format also now has a detailed specification [3] (this describes the current version, the format is still in flux and can change daily).

If you want, you can also try running the code. [4]
It's not production-quality yet (e.g. it doesn't report errors properly), but it should work.
Compilation instructions are in the README file.

Any comments or questions are welcome.

Petr Onderka
User:Svick

[1]: http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps
[2]: http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps/File_format/XML_output
[3]: http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps/File_format/Specification
[4]: https://github.com/wikimedia/operations-dumps-incremental/tree/gsoc