Hi Joachim,
Here is a short program that solves your problem:
https://github.com/Wikidata/Wikidata-Toolkit-Examples/blob/master/src/exampl...
It is in Java, so, you need that (and Maven) to run it, but that's the only technical challenge ;-). You can run the program in various ways as described in the README:
https://github.com/Wikidata/Wikidata-Toolkit-Examples
The program I wrote puts everything into a CSV file, but you can of course also write RDF triples if you prefer this, or any other format you wish. The code should be easy to modify.
On a first run, the tool will download the current Wikidata dump, which takes a while (it's about 6G), but after this you can find and serialise all results in less than half an hour (for a processing rate of around 10K items/second). A regular laptop is enough to run it.
Cheers,
Markus
On 11.02.2016 01:34, Stas Malyshev wrote:
Hi!
I try to extract all mappings from wikidata to the GND authority file, along with the according wikipedia pages, expecting roughly 500,000 to 1m triples as result.
As a starting note, I don't think extracting 1M triples may be the best way to use query service. If you need to do processing that returns such big result sets - in millions - maybe processing the dump - e.g. with wikidata toolkit at https://github.com/Wikidata/Wikidata-Toolkit - would be better idea?
However, with various calls, I get much less triples (about 2,000 to 10,000). The output seems to be truncated in the middle of a statement, e.g.
It may be some kind of timeout because of the quantity of the data being sent. How long does such request take?