How are you dealing with extensibility?
We need to be able to extend the format. The fields of data we need to
export change over time (just look at the changelog for our export's XSD
file https://www.mediawiki.org/xml/export-0.7.xsd).
Here are some things in that XML format you are missing in the incremental:
- Redirect info
- Upload info
- Log items
- Liquid Threads support
And something that I don't think we've thought about support for in our
current export format, ContentHandler. There's metadata for it missing
from our dumps and the data format is somewhat different than our text
dumps have traditionally expected.
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [
http://danielfriesen.name/]
On Mon, 01 Jul 2013 07:00:23 -0700, Petr Onderka
gsvick-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
> For my GSoC project Incremental data dumps [1], I'm creating a new file
> format to replace Wikimedia's XML data dumps.
> A sketch of how I imagine the file format to look like is at
>
http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps/File_format.
>
> What do you think? Does it make sense? Would it work for your use case?
> Any comments or suggestions are welcome.
>
> Petr Onderka
> [[User:Svick]]
>
> [1]:
http://www.mediawiki.org/wiki/User:Svick/Incremental_dump