Here is a short program that solves your problem:
It is in Java, so, you need that (and Maven) to run it, but that's the
only technical challenge ;-). You can run the program in various ways as
described in the README:
The program I wrote puts everything into a CSV file, but you can of
course also write RDF triples if you prefer this, or any other format
you wish. The code should be easy to modify.
On a first run, the tool will download the current Wikidata dump, which
takes a while (it's about 6G), but after this you can find and serialise
all results in less than half an hour (for a processing rate of around
10K items/second). A regular laptop is enough to run it.
On 11.02.2016 01:34, Stas Malyshev wrote:
I try to extract all mappings from wikidata to
the GND authority file,
along with the according wikipedia pages, expecting roughly 500,000 to
1m triples as result.
As a starting note, I don't think extracting 1M triples may be the best
way to use query service. If you need to do processing that returns such
big result sets - in millions - maybe processing the dump - e.g. with
wikidata toolkit at https://github.com/Wikidata/Wikidata-Toolkit
be better idea?
However, with various calls, I get much less
triples (about 2,000 to
10,000). The output seems to be truncated in the middle of a statement, e.g.
It may be some kind of timeout because of the quantity of the data being
sent. How long does such request take?