Re: [Wikidata-tech] Questions on JSON dumps and format

2 Sep 2014


      Hi again,
I am in Berlin today and got my answers first hand, so for the record, 
here they are:
On 01.09.2014 16:07, Markus Krötzsch wrote:
...
Hi,
Some questions on the new dump options. I noticed that the XML dump
files use exactly the same content model and format for the new model as
they used for the old. This is not so great as it reduces the utility of
the <model> information greatly if the same model is used for
incompatible content. I am now trying to find a way to write code that
supports both old and new dumps. Hence my questions:
(1) The most recent full dump that is available contains the old format.
The most recent current dump that is available contains the new format.
Is it possible that a single dump contains both formats?
No, the dump-creating code transforms all content into the appropriate 
JSON during export. The data you see in dumps is always in the format 
that is generated by the most recent code that was used when the dump 
file was created, and hence all revisions are in the same format.
Currently, the XML-based revision dumps use different code for this than 
the code used in JSON dumps and API. In the near future, this will be 
unified.
...
(2a) If the answer to (1) is no: what are/will be the first (or last)
full/current/daily dump files that use the new format?
I did not get an answer to this question, but since it is certain that 
each file is in a single format, a viable strategy is to parse with the 
new format first; if there are errors, try parsing with the old format; 
if this succeeds even once, the whole remaining file should be parsed in 
the old format.
...
(2b) If the answer to (1) is yes: what is the revision number at which
the change was made (i.e., what is the largest revision number that is
still in the old format)?
Not applicable.
Markus

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [Wikidata-tech] Questions on JSON dumps and format