I try to extract all mappings from wikidata to the GND authority file, along with the according wikipedia pages, expecting roughly 500,000 to 1m triples as result.
However, with various calls, I get much less triples (about 2,000 to 10,000). The output seems to be truncated in the middle of a statement, e.g.
…
<http://d-nb.info/gnd/121043053> <http://www.w3.org/2004/02/skos/core#exactMatch> <http://www.wikidata.org/entity/Q39963> .
<http://d-nb.info/gnd/121043053> <http://schema.org/about> <https://de.wikipedia.org/wiki/Park%20Kyung-ni> .
<http://d-nb.info/gnd/121043053> <http://schema.org/about> <https://en.wikipedia.org/wiki/Pa
The query (below) is called like this:
curl -X GET -H "Accept: text/plain" --silent https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=$ENCODED_QUERY -o /tmp/mappings.nt
Using turtle or rdf/xml as a format does also result in syntactically incorrect truncation in the middle of a statement. Adding “--no-buffer” to the curl command does not change anything.
Am I doing something wrong? Are there built-in limitations for the endpoint, which could result in arbitrary truncation?
Cheers, Joachim
# Get all GND mappings to persons
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX v: <http://www.wikidata.org/prop/statement/>
PREFIX q: <http://www.wikidata.org/prop/qualifier/>
PREFIX schema: <http://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
#
construct {
?gnd skos:exactMatch ?wd ;
schema:about ?sitelink .
}
#select ?gndId ?wd ?wdLabel ?sitelink ?gnd
where {
# get all wikidata items and labels linked to GND
?wd wdt:P227 ?gndId ;
rdfs:label ?wdLabel ;
# restrict to
wdt:P31 wd:Q5 . # instance of human
# get site links (only from de/en wikipedia sites)
?sitelink schema:about ?wd ;
schema:inLanguage ?language .
filter (contains(str(?sitelink), 'wikipedia'))
filter (lang(?wdLabel) = ?language && ?language in ('en', 'de'))
bind(uri(concat('http://d-nb.info/gnd/', ?gndId)) as ?gnd)
}