I try to extract all mappings from wikidata to the GND authority file, along with the according wikipedia pages, expecting roughly 500,000 to 1m triples as result.
However, with various calls, I get much less triples (about 2,000 to 10,000). The output seems to be truncated in the middle of a statement, e.g.
... http://d-nb.info/gnd/121043053 http://www.w3.org/2004/02/skos/core#exactMatch http://www.wikidata.org/entity/Q39963 . http://d-nb.info/gnd/121043053 http://schema.org/about https://de.wikipedia.org/wiki/Park%20Kyung-ni . http://d-nb.info/gnd/121043053 http://schema.org/about <https://en.wikipedia.org/wiki/Pa
The query (below) is called like this:
curl -X GET -H "Accept: text/plain" --silent https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=$ENCODED_QUERY -o /tmp/mappings.nt
Using turtle or rdf/xml as a format does also result in syntactically incorrect truncation in the middle of a statement. Adding "--no-buffer" to the curl command does not change anything.
Am I doing something wrong? Are there built-in limitations for the endpoint, which could result in arbitrary truncation?
Cheers, Joachim
# Get all GND mappings to persons PREFIX wd: http://www.wikidata.org/entity/ PREFIX wdt: http://www.wikidata.org/prop/direct/ PREFIX wikibase: http://wikiba.se/ontology# PREFIX p: http://www.wikidata.org/prop/ PREFIX v: http://www.wikidata.org/prop/statement/ PREFIX q: http://www.wikidata.org/prop/qualifier/ PREFIX schema: http://schema.org/ PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX skos: http://www.w3.org/2004/02/skos/core# # construct { ?gnd skos:exactMatch ?wd ; schema:about ?sitelink . } #select ?gndId ?wd ?wdLabel ?sitelink ?gnd where { # get all wikidata items and labels linked to GND ?wd wdt:P227 ?gndId ; rdfs:label ?wdLabel ; # restrict to wdt:P31 wd:Q5 . # instance of human # get site links (only from de/en wikipedia sites) ?sitelink schema:about ?wd ; schema:inLanguage ?language . filter (contains(str(?sitelink), 'wikipedia')) filter (lang(?wdLabel) = ?language && ?language in ('en', 'de')) bind(uri(concat('http://d-nb.info/gnd/', ?gndId)) as ?gnd) }