I spend a lot of time processing the XML dumps that
this will affect. I
just wanted to chime in to say that this change makes sense to me and it
won't affect my work.
-Aaron
On Thu, Oct 23, 2014 at 9:06 AM, Daniel Kinzler <daniel(a)brightbyte.de>
wrote:
tl;dr:
In the xml dumps, I want to change
<text> <sha1> <model> <format>
to
<model> <format> <text> <sha1>
However, this is a breaking change to our XML schema.
See
https://bugzilla.wikimedia.org/show_bug.cgi?id=72417
Background:
While trying to fix bug 72361, I ran into an issue with our current XML
dump format:
The <model> and <format> tags are placed *after* the <text> tag.
This means that we don't know how to handle the text when we process XML
events
in a stream - we'd have to buffer the text, wait until we know model and
format,
and then process it. A pain.
The current order has no deeper meaning - it is, indeed, my own fault: i
didn't
think this through when adding these tags. I propose to change the order
of the
tags now, to make stream processing easier.
That would technically be a breaking change to the dump format,
incompatible
with <https://www.mediawiki.org/xml/export-0.8.xsd> and export-0.9.xsd. I
doubt
however that any consumers rely on the current placement of <model> and
<format>, as it is extremely inconvenient (compare bug 72361), but you
never know.
I propose to release a new XSD version 0.10 with the order changed, and
mention
it in the release notes. Should be fine.
Any objections?
-- daniel
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org