Re: [Wikidata] Apologies

3 Jun 2015

      Dear Mike,
On 03.06.2015 22:11, Mike Cummins wrote:
...
I apologise in advance for asking this in what seems a Technical
List, but @pigsonthewing tells me I can.

You are welcome.
...
I want to extract all National Trust names, co-ordinates and the
first part of the description from wiki.

I am afraid that the data you are looking for may not be complete. For 
example, have a look at:
https://www.wikidata.org/wiki/Q4723912
This is a National Trust site, but Wikidata does not contain any 
statements that says this. So whatever technical means you use, they 
will not return this item (yet).
...
Having spent a day working through the API, I still cannot see how
to do this aside from scraping.

I am sure I am missing something very simple, so if anyone would
kindly email me, I would be grateful.

There are many ways. One is to use one of the SPARQL endpoints. Here I 
am using the new experimental one of the WMF:
This query shows all things owned by National Trust, with English label, 
description, and coordinates (where available):
http://tinyurl.com/ntpx8qf
Click "execute" to run it. There are ways to get this result in 
different formats (not embedded in HTML) but I don't find this right now.
The above query may not be the right one (just 200 results). Here is 
another one for all things with a number in the National Heritage List 
for England:
http://tinyurl.com/plcjv3f
This time it's from another SPARQL endpoint, as you can see. As opposed 
to the previous query, this one has tens of thousands of results. No 
SPARQL endpoint I tried manages to return all of them before the 
timeout. But the one I linked above manages significantly more than the 
other one (30k still worked for me, while the WMF experimental endpoint 
currently times out even at 10k -- the service is running on a virtual 
machine that is not very powerful right now; this will change soon). The 
downside is that this endpoint is not updated every minute but only 
every month or so, which means that you won't see the current data.
Actually, a slightly simpler query does work for me on the WMF endpoint, 
see http://tinyurl.com/p2lk7c7. However, be warned that the +50k results 
displayed in fancy Javascript may slow down your browser too ;-)
The two query services are based on slightly different versions of the 
RDF data, hence the slightly different queries. This will all be unified 
as the RDF work continues. Anyway, I hope you get a first idea from 
these queries and maybe can play with them a bit to find other things 
(they look technical at first, but in the end you can just copy patterns 
and change Qids or Pids as you need).
The key problem in your case might be that the data you need is not 
really in Wikidata yet. You could look at the autolist2 tool from Magnus 
as an option to complete the data based on Wikipedia categories etc. (if 
the information you need is there). It is a very efficient tool for 
addling large numbers of statements in little time. For something as 
simple as adding "owned by National Trust" ("P127 Q333515") this might 
work well. If you need more elaborate data, such as some kind of 
official IDs for each site, then someone who has a bot may be able to 
help you if you know of a source where this data can be found.
Regards,
Markus
...
Mike

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Apologies