Hi,
after a month of work on my GSoC project Incremental Dumps [1], I think I
have now something worth sharing and talking about, though it's still far
from complete.
What the code can do now is to read a pages-history XML dump and create the
various kinds of dumps (pages/stub, current/history) in the new format from
that.
It can then convert a dump in the new format back to XML.
The XML output is almost the same as existing XML dumps, but there are some
differences [2].
The current state of the new format also now has a detailed specification
[3] (this describes the current version, the format is still in flux and
can change daily).
If you want, you can also try running the code. [4]
It's not production-quality yet (e.g. it doesn't report errors properly),
but it should work.
Compilation instructions are in the README file.
Any comments or questions are welcome.
Petr Onderka
User:Svick
[1]:
http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps
[2]:
http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps/File_format/XML_…
[3]:
http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps/File_format/Spec…
[4]:
https://github.com/wikimedia/operations-dumps-incremental/tree/gsoc