Hi!
Is it possible to query all documents with a certain property, e.g "P17"(=Country). I would like todo that with the wikidata toolkit for java.
On 30.05.2016 22:05, Florian Bachmair wrote:
Hi!
Is it possible to query all documents with a certain property, e.g "P17"(=Country). I would like todo that with the wikidata toolkit for java.
Dear Florian,
Wikidata Toolkit is mainly used for processing dumps. You could answer your query by processing a whole dump and checking, for each entity you find there, if it meets your requirements. If you are looking for rather large datasets and complex conditions, then this might be the only solution to this task.
If your query is still simple enough to run within the timeout, then the SPARQL query service is the way to go. I just checked and the result to your query (things with P17) has over 4.7 M statements (you can also find stats on P17 in SQID [1]). It's not small, but the query is simple, and it actually worked for me. Don't try it from the browser though (the result is >500M in XML). You can do:
wget https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=SELECT%20%3Fen...
One can also get this in JSON if requested. If you are not familiar with SPARQL, maybe start with the examples at https://query.wikidata.org/
You can query SPARQL also from Java to use Wikidata Toolkit, and I am doing this in some tools. There is an example in
https://github.com/Wikidata/Wikidata-Toolkit/blob/sqid-helper/wdtk-client/sr...
However, the result parsing code there is not optimized for results with more than 500K entities. The generic JSON parsing used now is consuming a lot of memory. For 4.7 M results, one would need to use a streaming parser instead, in which case very little memory should be enough. As you can see, there is no specific SPARQL query API in Wikidata Toolkit, but there is not much code needed to run simple queries.
Best regards,
Markus
[1] http://tools.wmflabs.org/sqid/#/view?id=P17
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
wikidata-tech@lists.wikimedia.org