Dear Stas,
thanks for your reply!
On Sat, Jan 9, 2016 at 7:29 AM, Stas Malyshev smalyshev@wikimedia.org wrote:
statements (about 2.5M) and on the question if SPARQL could list all entries in Wikidata that do not have statements. I played a bit with
Technically, it could, but since it's so many of them, they might not finish in time. The problem is that since there's no indexes on something not existing, what probably happens is that the database would go entity by entity trying to find one that doesn't have a statement, and that is slow. I think there may be a bug with LIMIT implementation, or maybe it's just indeed taking too long...
Yeah, ideally LIMIT would make it stop searching when it found that many hits... but it indeed may really be trying that.
combinations of OPTIONAL and FILTER-BOUND and FILTER NOT EXIST... something like:
PREFIX wikibase: http://wikiba.se/ontology# SELECT DISTINCT ?entry ?label ?statement WHERE { ?entry rdfs:label ?label . FILTER (lang(?label) = "en") FILTER NOT EXISTS { ?statement ?prop ?entry ; wikibase:rank ?rank . } } LIMIT 5
This query also seems a bit wrong since it looks for ?entry as object, not subject.
There exists predicates between the ?entry and the statement in both directions. I played a bit with both.
But there was something else I noted... statements are not typed... that would probably kick in some index, rather than the above query, and the documentation actually speaks about wikibase:Statement [1] but if I search for anything rdf:type-d as such, then it finds nothing in the SPARQL end point:
Right, please check out: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#WDQS_data_d...
wikibase:Statement is ommitted from the database for performance reasons.
Ah, I was guessing something like that; thanks for the confirmation.
You could still match statements by URL by converting them to str() and then using substr() function, but that probably wouldn't help much since there's a lot of statements so the filtering would not be very selective.
Indeed. Well, maybe I give this a try. I'll let you know if I got it working...
Thanks!
Egon