Thanks, Jaime, for your recommendation.
If I understand the result of [1] correctly, there are around 3.5 million
pages with a GeoNames property specified on Wikidata. I'm sure some of them
are redirects, or not cities, etc. But still, going through millions of
pages through API calls of 1000 at a time is cumbersome and inefficient.
(The example you gave takes 20 seconds to run; that would mean a total of
20 * 3.5 * 1000 seconds which is like 19 hours, assuming no lag or error).
However, what you suggested gave me an idea: I can take a look at the code
for the Api itself (I guess it is at [2]) and figure out how the query is
written there, then try to write a similar query on my own. If I figure it
out, I will report back here.
Huji
[1]
https://quarry.wmflabs.org/query/25418
[2]
https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/client/inclu…
On Tue, Mar 13, 2018 at 5:39 AM, Jaime Crespo <jcrespo(a)wikimedia.org> wrote:
I am not 100% sure there is a perfect way to do what
you want by querying
the metadata databases (I assume that is what you mean with query)- I don't
think that data is metadata, but content itself, which is not on the
metadata databases.
Calling the wikidata query service is probably what you want:
<https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3Fgeoname%
0AWHERE%20%7B%0A%09%3Fitem%20wdt%3AP1566%20%3Fgeoname%20.
%0A%09SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%
3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22%20%7D%0A%
7D%0ALIMIT%201000%20OFFSET%201000>
Note the LIMIT and OFFSET that will let you iterate over the dataset (a
where close would be faster).
There is a way to get results, which is iterating over:
<https://www.wikidata.org/w/index.php?title=Special:
WhatLinksHere/Property:P1566&hidetrans=1&hideredirs=1>
That is a standard mediawiki api query, you will also find this on the
pagelinks table, but you should check every page you get afterwards (by
retrieving its contents), as it could include false positives or be behind
on updates.
On Sun, Mar 11, 2018 at 3:44 PM, Huji Lee <huji.huji(a)gmail.com> wrote:
Hello,
I need help writing a query that I would like to run on the Clouds. The
goal of the query is to retrieve the following information from
wikidatawiki_p:
* Find all pages that have a claim for the property P1566, for example
see
https://www.wikidata.org/wiki/Q2113430
* Find out what is the value of their P1566 property (in this case, 18918)
Output format should be like this:
page_id entity property_value
2039804 Q2113430 18918
...
Thanks in advance,
Huji
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud
--
Jaime Crespo
<http://wikimedia.org>
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud