On 04.05.2013 19:13, Jona Christopher Sahnwaldt wrote:
We will produce a DBpedia release pretty soon, I
don't think we can
wait for the "real" dumps. The inter-language links are an important
part of DBpedia, so we have to extract data from almost all Wikidata
items. I don't think it's sensible to make ~10 million calls to the
API to download the external JSON format, so we will have to use the
XML dumps and thus the internal format.
Oh, if it's just the language links, this isn't an issue: there's an
table for them in the database, and we'll soon be providing a separate dump of
that at table http://dumps.wikimedia.org/wikidatawiki/
If it's not there when you need it, just ask us for a dump of the sitelinks
table (technically, wb_items_per_site), and we'll get you one.
But I think it's not a big
deal that it's not that stable: we parse the JSON into an AST anyway.
It just means that we will have to use a more abstract AST, which I
was planning to do anyway. As long as the semantics of the internal
format will remain more or less the same - it will contain the labels,
the language links, the properties, etc. - it's no big deal if the
syntax changes, even if it's not JSON anymore.
Yes, if you want the labels and properties in addition to the links, you'll have
to do that for now. But I'm working on the "real" data dumps.