On 30.05.2016 22:05, Florian Bachmair wrote:
Hi!
Is it possible to query all documents with a certain property, e.g
"P17"(=Country).
I would like todo that with the wikidata toolkit for java.
Dear Florian,
Wikidata Toolkit is mainly used for processing dumps. You could answer
your query by processing a whole dump and checking, for each entity you
find there, if it meets your requirements. If you are looking for rather
large datasets and complex conditions, then this might be the only
solution to this task.
If your query is still simple enough to run within the timeout, then the
SPARQL query service is the way to go. I just checked and the result to
your query (things with P17) has over 4.7 M statements (you can also
find stats on P17 in SQID [1]). It's not small, but the query is simple,
and it actually worked for me. Don't try it from the browser though (the
result is >500M in XML). You can do:
wget
https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=SELECT%20%3Fe…
One can also get this in JSON if requested. If you are not familiar with
SPARQL, maybe start with the examples at
https://query.wikidata.org/
You can query SPARQL also from Java to use Wikidata Toolkit, and I am
doing this in some tools. There is an example in
https://github.com/Wikidata/Wikidata-Toolkit/blob/sqid-helper/wdtk-client/s…
However, the result parsing code there is not optimized for results with
more than 500K entities. The generic JSON parsing used now is consuming
a lot of memory. For 4.7 M results, one would need to use a streaming
parser instead, in which case very little memory should be enough. As
you can see, there is no specific SPARQL query API in Wikidata Toolkit,
but there is not much code needed to run simple queries.
Best regards,
Markus
[1]
http://tools.wmflabs.org/sqid/#/view?id=P17
_______________________________________________
Wikidata-tech mailing list
Wikidata-tech(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
--
Markus Kroetzsch
Faculty of Computer Science
Technische Universität Dresden
+49 351 463 38486
http://korrekt.org/