Hi, I am getting frequent timeouts trying to use the SPARQL endpoint GUI at https://query.wikidata.org/ . I'll admit, I have some complex queries, bu I really feel like this is something that the system should be able to handle or at least allow me to request a longer timeout wait. For example, this query:
SELECT ?item ?item2 WHERE {
?item wdt:P625 ?location . ?item http://www.w3.org/2002/07/owl#sameAs ?item2 .
} LIMIT 10
or this query:
SELECT DISTINCT ?item ?itemname ?location WHERE { ?item wdt:P625 ?location ; wdt:P31 ?type ; rdfs:label ?itemname. ?type wdt:P279 ?supertype .
FILTER( LANG(?itemname) = "en" && ?supertype not in (wd:Q5, wd:Q4991371, wd:Q7283, wd:Q36180, wd:Q7094076, wd:Q905511, wd:Q1063801, wd:Q1062856, wd:Q35127, wd:Q68, wd:Q42848, wd:Q2858615, wd:Q241317 , wd:Q1662611, wd:Q7397, wd:Q151885, wd:Q1301371, wd:Q1068715, wd:Q7366 , wd:Q18602249, wd:Q16521, wd:Q746549, wd:Q13485782, wd:Q36963) )
} LIMIT 200000
When I use python SPARQLwrapper things improve somewhat, but still timeout on some of my queries. I tried the first query above on an old wikidata dump we have from 2021 that we loaded on Jena TDB and it managed to complete it (0 results, but I had to run it to figure that out...). Seems strange to get such poor performance. Cheers Tomer
Hi Tomer,
Unfortunately your queries do work on a rather large portion of the data (P625 has ~10 million items) and I could not find an obvious way to optimize them. Have you considered using other services like https://qlever.cs.uni-freiburg.de/wikidata or https://wikidata.demo.openlinksw.com/sparql to have a comparison of how they perform? It is very unlikely that we will allow longer timeouts in the near future so if you plan to work on a large subset I think that using dumps (RDF or json) might be a better option for you at the moment. WDQS is not fit to extract large subsets of wikidata. Another option might be to discuss and get advice on https://www.wikidata.org/wiki/Wikidata:Request_a_query, there might be different and more performant ways to do what you want?
Hope this helps a bit,
David.
On Fri, Dec 9, 2022 at 1:22 PM ts.tomersagi@gmail.com wrote:
Hi, I am getting frequent timeouts trying to use the SPARQL endpoint GUI at https://query.wikidata.org/ . I'll admit, I have some complex queries, bu I really feel like this is something that the system should be able to handle or at least allow me to request a longer timeout wait. For example, this query:
SELECT ?item ?item2 WHERE { ?item wdt:P625 ?location . ?item <http://www.w3.org/2002/07/owl#sameAs> ?item2 . } LIMIT 10
or this query:
SELECT DISTINCT ?item ?itemname ?location WHERE { ?item wdt:P625 ?location ; wdt:P31 ?type ; rdfs:label ?itemname. ?type wdt:P279 ?supertype .
FILTER( LANG(?itemname) = "en" && ?supertype not in (wd:Q5, wd:Q4991371, wd:Q7283,
wd:Q36180, wd:Q7094076, wd:Q905511, wd:Q1063801, wd:Q1062856, wd:Q35127, wd:Q68, wd:Q42848, wd:Q2858615, wd:Q241317 , wd:Q1662611, wd:Q7397, wd:Q151885, wd:Q1301371, wd:Q1068715, wd:Q7366 , wd:Q18602249, wd:Q16521, wd:Q746549, wd:Q13485782, wd:Q36963) )
}
LIMIT 200000
When I use python SPARQLwrapper things improve somewhat, but still timeout on some of my queries. I tried the first query above on an old wikidata dump we have from 2021 that we loaded on Jena TDB and it managed to complete it (0 results, but I had to run it to figure that out...). Seems strange to get such poor performance. Cheers Tomer _______________________________________________ Discovery mailing list -- discovery@lists.wikimedia.org To unsubscribe send an email to discovery-leave@lists.wikimedia.org