I am in Berlin today and got my answers first hand, so for the record,
here they are:
On 01.09.2014 16:07, Markus Krötzsch wrote:
Some questions on the new dump options. I noticed that the XML dump
files use exactly the same content model and format for the new model as
they used for the old. This is not so great as it reduces the utility of
the <model> information greatly if the same model is used for
incompatible content. I am now trying to find a way to write code that
supports both old and new dumps. Hence my questions:
(1) The most recent full dump that is available contains the old format.
The most recent current dump that is available contains the new format.
Is it possible that a single dump contains both formats?
No, the dump-creating code transforms all content into the appropriate
JSON during export. The data you see in dumps is always in the format
that is generated by the most recent code that was used when the dump
file was created, and hence all revisions are in the same format.
Currently, the XML-based revision dumps use different code for this than
the code used in JSON dumps and API. In the near future, this will be
(2a) If the answer to (1) is no: what are/will be the first (or last)
full/current/daily dump files that use the new format?
I did not get an answer to this question, but since it is certain that
each file is in a single format, a viable strategy is to parse with the
new format first; if there are errors, try parsing with the old format;
if this succeeds even once, the whole remaining file should be parsed in
the old format.
(2b) If the answer to (1) is yes: what is the revision number at which
the change was made (i.e., what is the largest revision number that is
still in the old format)?