Hi. Currently, the dump service offers two different dumps for wikidata:
* XML:
http://dumps.wikimedia.org/wikidatawiki/latest/
* JSON:
http://dumps.wikimedia.org/wikidatawiki/entities/
According to
http://www.wikidata.org/wiki/Wikidata:Database_download,
the JSON dump is listed as the recommended dump format. Also, at the
time of writing, the JSON dump has been generating regularly every
week whereas the XML dump has been delayed for 2+ months.
Going forward, will both dumps continue to be supported? Or will the
XML dump be phased out and only the JSON dump remain? Or are these
plans still to be determined based on upcoming changes to the dumping
infrastructure as per
https://phabricator.wikimedia.org/T88728?
If the JSON dump is to be the sole data format, is there any way to
address the following omissions?
* '''Non-JSON pages not available''': The JSON dump only provides
JSON
content-type pages in the main and property namespaces. Pages in other
namespaces are not available, including the Main Page. For example,
here are the counts from the 2015-03-30 dump
id name count
---- ------------ -----
4 Wikidata 10280
8 MediaWiki 2244
10 Template 4701
12 Help 779
14 Category 3073
828 Module 175
1198 Translations 83524
* '''Page metadata not available''' : For the JSON pages, the
page_touched and page_id is not available.
* '''Other tables not provided''': Other tables are not provided,
notably categorylinks and page_props
Thanks in advance for any information.