[Wikidata] Re: Help make this Property Query faster

5 Nov 2021

      Hi Thad,
I looked at this query and I have nothing to add to what was suggested
already to make it run faster.
I think the main issue is the size of the intermediate results that have to
have the language filter applied, sadly almost every time that a FILTER is
being used on a string literal blazegraph might have to fetch its
representation from its lexicon which incur a huge slowdown.
Regarding indices and ordering I believe the right indices are being used
otherwize the query would certainly time out, I doubt it can filter all
english labels before joining them to the property labels.
The criterion ?prop wdt:P31/wdt:P279* wd:Q18616576 does indeed seem useless
to me and is pulling a couple false positives[1] into the join (totally
harmless regarding query perf but should perhaps be cleaned up from
wikidata?).
So filtering & fetching the textual data is indeed what makes this query
slow. I tried various combinations but could not come up with reasonable &
stable sub-second response times. Fetching the textual data (possibly
lazily) from another service might help but this certainly is a consequent
rewrite of the client relying on this query.
Caching is definitely going to help especially if this data is not subject
to rapid/frequent changes, the WDQS infrastructure has a caching layer but
retention might not be long enough to be useful for this particular tool.
The json output seems indeed quite big (almost 5Mb), while not
enormous it's still consequent and if this data is relatively stable there
might be value in refreshing it on purpose (daily as you suggest) and
making it available on a static storage.
Another note about response times, you may see varying response times from
the query service and the reasons might be one of the following:
- it's cached on the query service caching layer (generally sub 100ms
response time)
- the server the query hits is heavily loaded
- the server the query hits is an old generation (we have 2 different kinds
of hardware setup in the cluster at the moment and might explain some of
the variance you see).
Hope it helps a bit,
Regards,
David.
1: https://w.wiki/4Lae
On Wed, Nov 3, 2021 at 11:39 PM Thad Guidry thadguidry@gmail.com wrote:
...
Thanks Kingsley, Thomas, Jeff,
From what I see the live query never is sub second and that's likely
because of 2 things:

indexing not prioritizing this kind of query and aligning it (which

David Causse might know if that could be changed), essentially its metadata
about Wikidata (it's available properties).
  2. it's 2.2 MB of data
I think that Yi Liu's Wikidata Property Explorer service then might want
to instead cache the results for 24 hours for the best of both worlds.
To be fair, the raw amount of data requested seems to be approximately 2.2
MB and so probably should be locally cached by his tool for some determined
time (like 24 hours).
Thad
https://www.linkedin.com/in/thadguidry/
https://calendly.com/thadguidry/

Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-leave@lists.wikimedia.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Wikidata] Re: Help make this Property Query faster