[Wikidata] Re: Inconsistencies on WDQS data - data reload on WDQS

23 Feb 2023


      On 23/02/2023 20:08, Kingsley Idehen via Wikidata wrote:
...
On 2/23/23 12:19 PM, James Heald wrote:
...
I have to say I am a bit concerned by this talk, since some of 
Blazegraph's "features and quirks" can be exceedingly useful.
That isn't justification for tightly-coupling a Query Tool to a Query 
Service Endpoint, especially when an open standard (in the form of 
SPARQL) exists.
Of course it's a good thing to be able to swap out the back-end and to 
be able to run essentially the same queries against other realisations 
of the database.
It's also a good thing to be able to clone the user interface and use 
essentially the same UI with a different back-end.  (As I understand it, 
this should be very possible).
But. There are features which have been listed in the desiderata for 
WDQS from the very start, that go beyond what the out-of-the-box SPARQL 
1.1 standard offers.
Most notable among these is the ability to retrieve items with 
coordinates close to a particular point on the earth's surface. 
(Something which, as the Blazegraph developers discovered, can be 
implemented fairly easily if you add a "Z-order curve" index on 
coordinate values  https://en.wikipedia.org/wiki/Z-order_curve ).
Not all users will have an interest in geographical objects.  Those who 
don't will lose little if they hook up a back-end that doesn't provide 
this, because presumably they won't be running queries which require it. 
But those who do need this functionality need this indexing.
Given that this was something the Blazegraph developers (all 3 of them) 
found they could add relatively easily; and given that it seems to me 
that any database back-end would gain considerable cachet by being able 
to run wikidata queries, it seems to me not unreasonable to approach 
potential alternative back-ends and see how easily they too might be 
able to add a Z-order curve index for coordinate values, plus basic 
functionality to make use of it.  (Where wikibase:box and 
wikibase:around are about as basic as it gets).
Andrea suggested a more GeoSPARQL-orientated solution ( 
https://wikitech.wikimedia.org/wiki/User:AndreaWest/Blazegraph_Features_and_... 
), but that seems to me a much much bigger ask; I do suspect that (for 
almost all contending projects) the simple wikibase:box and 
wikibase:around services would be a lot more easily implemented, to free 
us from our tight-coupling to Blazegraph, yet still provide this 
functionality, which I do believe is a needed requirement.
As for named subqueries, as well as making queries much more readable, 
IMO they may be particularly valuable as a way to specify particular 
optimisations (ie sequencing of query execution, that may be absolutely 
*crucial* if a query is to run) in a particularly readable and 
**portable** way -- certainly when compared to optimiser "hint" 
syntaxes, that may be tied *very* specifically to a particular back-end.
Why do I think named subqueries are so portable, if they are not part of 
the SPARQL 1.1 standard, and most providers don't support them ?
The answer is because if necessary it would require only a fairly simple 
pre-processor script to turn them into inline sub-queries, which *are* 
supported by the standard.
Named sub-queries having the advantage though of making the query a lot 
more readable; and can be useful to indicate to the back-end that the 
sub-query need only be retrieved once, rather than repeatedly each time 
it is referenced (which may be helpful for some back-ends).
So: I don't disagree that it would be useful if WDQS was less tightly 
dependent on Blazegraph.
But: rather than going straight to removing good features, I think there 
is a lot of scope for seeing whether the dev teams for other back-ends 
could be persuaded to match the features on those back-ends without too 
much difficulty; and that this would be a better path to at least 
investigate, in preference to breaking swathes of queries that are in 
active use.
--  James.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Wikidata] Re: Inconsistencies on WDQS data - data reload on WDQS