Hi Daniel,
I started working on the DBpedia release and just wanted to check what's the current status of the Wikidata dumps. I saw that RDF data and RDF URIs like http://www.wikidata.org/entity/Q1 are already available. Cool! Do you think there will be RDF dumps soon, i.e. in the next few weeks?
If not, could you guys prepare a dump of the sitelinks table, as you suggested below? If it's not too much effort, it would be cool if you could generate CSV or a similar simple format. We won't put the stuff into a DB, we just extract the data, and we would have to write a parser for SQL insert statements. CSV would be much simpler.
Thanks a lot for your help!
Christopher
On 4 May 2013 23:36, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
On 04.05.2013 19:13, Jona Christopher Sahnwaldt wrote:
We will produce a DBpedia release pretty soon, I don't think we can wait for the "real" dumps. The inter-language links are an important part of DBpedia, so we have to extract data from almost all Wikidata items. I don't think it's sensible to make ~10 million calls to the API to download the external JSON format, so we will have to use the XML dumps and thus the internal format.
Oh, if it's just the language links, this isn't an issue: there's an additional table for them in the database, and we'll soon be providing a separate dump of that at table http://dumps.wikimedia.org/wikidatawiki/
If it's not there when you need it, just ask us for a dump of the sitelinks table (technically, wb_items_per_site), and we'll get you one.
But I think it's not a big deal that it's not that stable: we parse the JSON into an AST anyway. It just means that we will have to use a more abstract AST, which I was planning to do anyway. As long as the semantics of the internal format will remain more or less the same - it will contain the labels, the language links, the properties, etc. - it's no big deal if the syntax changes, even if it's not JSON anymore.
Yes, if you want the labels and properties in addition to the links, you'll have to do that for now. But I'm working on the "real" data dumps.
-- daniel