Hi again,
I am in Berlin today and got my answers first hand, so for the record, here they are:
On 01.09.2014 16:07, Markus Krötzsch wrote:
Hi,
Some questions on the new dump options. I noticed that the XML dump files use exactly the same content model and format for the new model as they used for the old. This is not so great as it reduces the utility of the <model> information greatly if the same model is used for incompatible content. I am now trying to find a way to write code that supports both old and new dumps. Hence my questions:
(1) The most recent full dump that is available contains the old format. The most recent current dump that is available contains the new format. Is it possible that a single dump contains both formats?
No, the dump-creating code transforms all content into the appropriate JSON during export. The data you see in dumps is always in the format that is generated by the most recent code that was used when the dump file was created, and hence all revisions are in the same format.
Currently, the XML-based revision dumps use different code for this than the code used in JSON dumps and API. In the near future, this will be unified.
(2a) If the answer to (1) is no: what are/will be the first (or last) full/current/daily dump files that use the new format?
I did not get an answer to this question, but since it is certain that each file is in a single format, a viable strategy is to parse with the new format first; if there are errors, try parsing with the old format; if this succeeds even once, the whole remaining file should be parsed in the old format.
(2b) If the answer to (1) is yes: what is the revision number at which the change was made (i.e., what is the largest revision number that is still in the old format)?
Not applicable.
Markus