Lucas,
No I don't need the page_id. The other two are enough.
Wikidata Query Service seems very slow (it'll take about one day of
continuous querying to get all the data). Linked Data Fragments server
seems faster, but I wish I knew how to make it return more than 100 results
at a time. Do you?
Thanks,
Huji
On Wed, Mar 14, 2018 at 7:00 AM, Lucas Werkmeister <
lucas.werkmeister(a)wikimedia.de> wrote:
Huji, do you need the page_id in the query
results? Otherwise, I would
suggest using either the Wikidata Query Service, as Jaime suggested (though
I’d omit the LIMIT and OFFSET – I think it’s better to let the server send
you all the results at once) or the Linked Data Fragments server:
https://query.wikidata.org/bigdata/ldf?subject=&predicate=
http%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2FP1566&object= (this URL
will return HTML, RDF XML, Turtle, LD-JSON, … depending on Accept header).
Cheers,
Lucas
2018-03-14 1:03 GMT+01:00 Huji Lee <huji.huji(a)gmail.com>om>:
Thanks, Jaime, for your recommendation.
If I understand the result of [1] correctly, there are around 3.5
million pages with a GeoNames property specified on Wikidata. I'm sure some
of them are redirects, or not cities, etc. But still, going through
millions of pages through API calls of 1000 at a time is cumbersome and
inefficient. (The example you gave takes 20 seconds to run; that would mean
a total of 20 * 3.5 * 1000 seconds which is like 19 hours, assuming no lag
or error).
However, what you suggested gave me an idea: I can take a look at the
code for the Api itself (I guess it is at [2]) and figure out how the query
is written there, then try to write a similar query on my own. If I figure
it out, I will report back here.
Huji
[1]
https://quarry.wmflabs.org/query/25418
[2]
https://phabricator.wikimedia.org/diffusion/EWBA/browse/mast
er/client/includes/Api/ApiPropsEntityUsage.php
On Tue, Mar 13, 2018 at 5:39 AM, Jaime Crespo <jcrespo(a)wikimedia.org>
wrote:
I am not 100% sure there is a perfect way to do
what you want by
querying the metadata databases (I assume that is what you mean with
query)- I don't think that data is metadata, but content itself, which is
not on the metadata databases.
Calling the wikidata query service is probably what you want:
<https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel
%20%3Fgeoname%0AWHERE%20%7B%0A%09%3Fitem%20wdt%3AP1566%20%3F
geoname%20.%0A%09SERVICE%20wikibase%3Alabel%20%7B%20bd%3Aser
viceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Ce
n%22%20%7D%0A%7D%0ALIMIT%201000%20OFFSET%201000>
Note the LIMIT and OFFSET that will let you iterate over the dataset (a
where close would be faster).
There is a way to get results, which is iterating over:
<https://www.wikidata.org/w/index.php?title=Special:WhatLink
sHere/Property:P1566&hidetrans=1&hideredirs=1>
That is a standard mediawiki api query, you will also find this on the
pagelinks table, but you should check every page you get afterwards (by
retrieving its contents), as it could include false positives or be behind
on updates.
On Sun, Mar 11, 2018 at 3:44 PM, Huji Lee <huji.huji(a)gmail.com> wrote:
> Hello,
>
> I need help writing a query that I would like to run on the Clouds.
> The goal of the query is to retrieve the following information from
> wikidatawiki_p:
>
> * Find all pages that have a claim for the property P1566, for example
> see
https://www.wikidata.org/wiki/Q2113430
> * Find out what is the value of their P1566 property (in this case,
> 18918)
>
> Output format should be like this:
>
> page_id entity property_value
> 2039804 Q2113430 18918
> ...
>
> Thanks in advance,
>
> Huji
>
> _______________________________________________
> Wikimedia Cloud Services mailing list
> Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
>
https://lists.wikimedia.org/mailman/listinfo/cloud
>
--
Jaime Crespo
<http://wikimedia.org>
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud
--
Lucas Werkmeister
Software Developer (Intern)
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
https://wikimedia.de
Imagine a world, in which every single human being can freely share in
the sum of all knowledge. That‘s our commitment.
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud