The JSON dump is the preferred if you want to process the entity data. From the JSON dumps, you can get the current entities (items and properties) in the current canonical format, for further processing.
The XML dumps are an "opaque" exchange format for mediawiki page content. They are designed to allow content from pages in one wiki to be imported into another wiki(*), including old revisions. It can also be used for backups, since it provides a future proof way to store your wiki's content. But the format of the page content in the XML dumps is not strictly specified. It can be wikitext, or JSON data, or whatever. The JSON you find embedded of the XML dumps of wikidata may or may not be compatible with the format in the JSON file, and is subject to change without notice. It's not designed for processing by 3rd parties.
Wikidata XML dumps will be generated, for all pages, including history, like it is done for all Wikimedia projects. However, this process often breaks, due to the large size of these dumps. If you want to process Wikidata items, you should use the JSON dumps.
HTH -- daniel
(*) this is usually disabled for wikibase entities, to avoid ID conflicts.
Am 14.06.2015 um 03:38 schrieb gnosygnu:
According to http://www.wikidata.org/wiki/Wikidata:Database_download, the JSON dump is listed as the recommended dump format. Also, at the time of writing, the JSON dump has been generating regularly every week whereas the XML dump has been delayed for 2+ months.
Going forward, will both dumps continue to be supported? Or will the XML dump be phased out and only the JSON dump remain? Or are these plans still to be determined based on upcoming changes to the dumping infrastructure as per https://phabricator.wikimedia.org/T88728?
If the JSON dump is to be the sole data format, is there any way to address the following omissions?