Thanks for the clarification. So it looks like the JSON dumps were
designed only to have entity data. I guess it was never ever meant to
have other MediaWiki data, such as other namespaces, page table
metadata, other tables, etc..
I guess my main question is: are there any plans to phase out the XML
dump? I assume no, but given the recent problems with the XML dump
process, I just wanted to make sure.
Thanks again for the help.
On Sun, Jun 14, 2015 at 5:12 AM, Daniel Kinzler
<daniel.kinzler(a)wikimedia.de> wrote:
The JSON dump is the preferred if you want to process
the entity data. From the
JSON dumps, you can get the current entities (items and properties) in the
current canonical format, for further processing.
The XML dumps are an "opaque" exchange format for mediawiki page content. They
are designed to allow content from pages in one wiki to be imported into another
wiki(*), including old revisions. It can also be used for backups, since it
provides a future proof way to store your wiki's content. But the format of the
page content in the XML dumps is not strictly specified. It can be wikitext, or
JSON data, or whatever. The JSON you find embedded of the XML dumps of wikidata
may or may not be compatible with the format in the JSON file, and is subject to
change without notice. It's not designed for processing by 3rd parties.
Wikidata XML dumps will be generated, for all pages, including history, like it
is done for all Wikimedia projects. However, this process often breaks, due to
the large size of these dumps. If you want to process Wikidata items, you should
use the JSON dumps.
HTH
-- daniel
(*) this is usually disabled for wikibase entities, to avoid ID conflicts.
Am 14.06.2015 um 03:38 schrieb gnosygnu:
According to
http://www.wikidata.org/wiki/Wikidata:Database_download,
the JSON dump is listed as the recommended dump format. Also, at the
time of writing, the JSON dump has been generating regularly every
week whereas the XML dump has been delayed for 2+ months.
Going forward, will both dumps continue to be supported? Or will the
XML dump be phased out and only the JSON dump remain? Or are these
plans still to be determined based on upcoming changes to the dumping
infrastructure as per
https://phabricator.wikimedia.org/T88728?
If the JSON dump is to be the sole data format, is there any way to
address the following omissions?
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata