Dear Stas,
thanks for your reply!
On Sat, Jan 9, 2016 at 7:29 AM, Stas Malyshev <smalyshev(a)wikimedia.org> wrote:
statements
(about 2.5M) and on the question if SPARQL could list all
entries in Wikidata that do not have statements. I played a bit with
Technically, it could, but since it's so many of them, they might not
finish in time. The problem is that since there's no indexes on
something not existing, what probably happens is that the database would
go entity by entity trying to find one that doesn't have a statement,
and that is slow. I think there may be a bug with LIMIT implementation,
or maybe it's just indeed taking too long...
Yeah, ideally LIMIT would make it stop searching when it found that
many hits... but it indeed may really be trying that.
combinations
of OPTIONAL and FILTER-BOUND and FILTER NOT EXIST...
something like:
PREFIX wikibase: <http://wikiba.se/ontology#>
SELECT DISTINCT ?entry ?label ?statement WHERE {
?entry rdfs:label ?label . FILTER (lang(?label) = "en")
FILTER NOT EXISTS {
?statement ?prop ?entry ;
wikibase:rank ?rank .
}
} LIMIT 5
This query also seems a bit wrong since it looks for ?entry as object,
not subject.
There exists predicates between the ?entry and the statement in both
directions. I played a bit with both.
But there was
something else I noted... statements are not typed...
that would probably kick in some index, rather than the above query,
and the documentation actually speaks about wikibase:Statement [1] but
if I search for anything rdf:type-d as such, then it finds nothing in
the SPARQL end point:
Right, please check out:
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#WDQS_data_…
wikibase:Statement is ommitted from the database for performance
reasons.
Ah, I was guessing something like that; thanks for the confirmation.
You could still match statements by URL by converting
them to
str() and then using substr() function, but that probably wouldn't help
much since there's a lot of statements so the filtering would not be
very selective.
Indeed. Well, maybe I give this a try. I'll let you know if I got it working...
Thanks!
Egon
--
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (
http://www.bigcat.unimaas.nl/)
Homepage:
http://egonw.github.com/
LinkedIn:
http://se.linkedin.com/in/egonw
Blog:
http://chem-bla-ics.blogspot.com/
PubList:
http://www.citeulike.org/user/egonw/tag/papers
ORCID: 0000-0001-7542-0286
ImpactStory:
https://impactstory.org/EgonWillighagen