I try to extract all mappings from wikidata to the GND authority file, along with the according wikipedia pages, expecting roughly 500,000 to 1m triples as result.

 

However, with various calls, I get much less triples (about 2,000 to 10,000). The output seems to be truncated in the middle of a statement, e.g.

 

<http://d-nb.info/gnd/121043053> <http://www.w3.org/2004/02/skos/core#exactMatch> <http://www.wikidata.org/entity/Q39963> .

<http://d-nb.info/gnd/121043053> <http://schema.org/about> <https://de.wikipedia.org/wiki/Park%20Kyung-ni> .

<http://d-nb.info/gnd/121043053> <http://schema.org/about> <https://en.wikipedia.org/wiki/Pa

 

The query (below) is called like this:

 

curl -X GET -H "Accept: text/plain" --silent https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=$ENCODED_QUERY -o /tmp/mappings.nt

 

Using turtle or rdf/xml as a format does also result in syntactically incorrect truncation in the middle of a statement. Adding “--no-buffer” to the curl command does not change anything.

 

Am I doing something wrong? Are there built-in limitations for the endpoint, which could result in arbitrary truncation?

 

Cheers, Joachim

 

# Get all GND mappings to persons

PREFIX wd: <http://www.wikidata.org/entity/>

PREFIX wdt: <http://www.wikidata.org/prop/direct/>

PREFIX wikibase: <http://wikiba.se/ontology#>

PREFIX p: <http://www.wikidata.org/prop/>

PREFIX v: <http://www.wikidata.org/prop/statement/>

PREFIX q: <http://www.wikidata.org/prop/qualifier/>

PREFIX schema: <http://schema.org/>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

#

construct {

  ?gnd skos:exactMatch ?wd ;

    schema:about ?sitelink .

}

#select ?gndId ?wd ?wdLabel ?sitelink ?gnd

where {

  # get all wikidata items and labels linked to GND

  ?wd wdt:P227 ?gndId ;

      rdfs:label ?wdLabel ;

      # restrict to

      wdt:P31 wd:Q5  . # instance of human

  # get site links (only from de/en wikipedia sites)

  ?sitelink schema:about ?wd ;

            schema:inLanguage ?language .

  filter (contains(str(?sitelink), 'wikipedia'))

  filter (lang(?wdLabel) = ?language && ?language in ('en', 'de'))

  bind(uri(concat('http://d-nb.info/gnd/', ?gndId)) as ?gnd)

}