If the challenge is downloading large files, you can also get local access
to all of the dumps (wikidata, wikipedia, and more) through the PAWS
<https://wikitech.wikimedia.org/wiki/PAWS> (Wikimedia-hosted Jupyter
notebooks) and Toolforge
<https://wikitech.wikimedia.org/wiki/Help:Toolforge> (more general-purpose
Wikimedia hosting environment). From Toolforge, you could run the Wikidata
toolkit (Java) that Denny mentions. I'm personally more familiar with
Python, so my suggestion is to use Python code to filter down the dumps to
what you desire. Below is an example Python notebook that will do this on
PAWS, though the PAWS environment is not set up for these longer running
jobs and will probably die before the process is complete, so I'd highly
recommend converting it into a script that can run on Toolforge (see
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Dumps).
PAWS example:
https://paws-public.wmflabs.org/paws-public/User:Isaac_(WMF)/Simplified_Wik…
Best,
Isaac
On Thu, Apr 30, 2020 at 1:33 AM raffaele messuti <raffaele(a)docuver.se>
wrote:
On 27/04/2020 18:02, Kingsley Idehen wrote:
[1]
https://w.wiki/PBi <https://w.wiki/PBi>
Do these CONSTRUCT queries return any of the following document
content-types?
RDF-Turtle, RDF-XML, JSON-LD ?
you can use content negotiation on the sparql endpoint
~ query="CONSTRUCT { ... }"
~ curl -H "Accept: application/rdf+xml"
https://query.wikidata.org/sparql
--data-urlencode query=$query
~ curl -H "Accept: text/turtle" -G
https://query.wikidata.org/sparql
--data-urlencode query=$query
--
raffaele(a)docuver.se
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Isaac Johnson (he/him/his) -- Research Scientist -- Wikimedia Foundation