If the challenge is downloading large files, you can also get local access to all of the dumps (wikidata, wikipedia, and more) through the PAWS (Wikimedia-hosted Jupyter notebooks) and Toolforge (more general-purpose Wikimedia hosting environment). From Toolforge, you could run the Wikidata toolkit (Java) that Denny mentions. I'm personally more familiar with Python, so my suggestion is to use Python code to filter down the dumps to what you desire. Below is an example Python notebook that will do this on PAWS, though the PAWS environment is not set up for these longer running jobs and will probably die before the process is complete, so I'd highly recommend converting it into a script that can run on Toolforge (see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Dumps).

PAWS example: https://paws-public.wmflabs.org/paws-public/User:Isaac_(WMF)/Simplified_Wikidata_Dumps.ipynb


On Thu, Apr 30, 2020 at 1:33 AM raffaele messuti <raffaele@docuver.se> wrote:
On 27/04/2020 18:02, Kingsley Idehen wrote:
>> [1] https://w.wiki/PBi <https://w.wiki/PBi>
> Do these CONSTRUCT queries return any of the following document content-types?
> RDF-Turtle, RDF-XML, JSON-LD ?

you can use content negotiation on the sparql endpoint

~ query="CONSTRUCT { ... }"
~ curl -H "Accept: application/rdf+xml" https://query.wikidata.org/sparql --data-urlencode query=$query
~ curl -H "Accept: text/turtle" -G https://query.wikidata.org/sparql --data-urlencode query=$query


Wikidata mailing list

Isaac Johnson (he/him/his) -- Research Scientist -- Wikimedia Foundation