On 23/02/2023 20:08, Kingsley Idehen via Wikidata wrote:
On 2/23/23 12:19 PM, James Heald wrote:
I have to say I am a bit concerned by this talk, since some of Blazegraph's "features and quirks" can be exceedingly useful.
That isn't justification for tightly-coupling a Query Tool to a Query Service Endpoint, especially when an open standard (in the form of SPARQL) exists.
Of course it's a good thing to be able to swap out the back-end and to be able to run essentially the same queries against other realisations of the database.
It's also a good thing to be able to clone the user interface and use essentially the same UI with a different back-end. (As I understand it, this should be very possible).
But. There are features which have been listed in the desiderata for WDQS from the very start, that go beyond what the out-of-the-box SPARQL 1.1 standard offers.
Most notable among these is the ability to retrieve items with coordinates close to a particular point on the earth's surface. (Something which, as the Blazegraph developers discovered, can be implemented fairly easily if you add a "Z-order curve" index on coordinate values https://en.wikipedia.org/wiki/Z-order_curve ).
Not all users will have an interest in geographical objects. Those who don't will lose little if they hook up a back-end that doesn't provide this, because presumably they won't be running queries which require it. But those who do need this functionality need this indexing.
Given that this was something the Blazegraph developers (all 3 of them) found they could add relatively easily; and given that it seems to me that any database back-end would gain considerable cachet by being able to run wikidata queries, it seems to me not unreasonable to approach potential alternative back-ends and see how easily they too might be able to add a Z-order curve index for coordinate values, plus basic functionality to make use of it. (Where wikibase:box and wikibase:around are about as basic as it gets).
Andrea suggested a more GeoSPARQL-orientated solution ( https://wikitech.wikimedia.org/wiki/User:AndreaWest/Blazegraph_Features_and_... ), but that seems to me a much much bigger ask; I do suspect that (for almost all contending projects) the simple wikibase:box and wikibase:around services would be a lot more easily implemented, to free us from our tight-coupling to Blazegraph, yet still provide this functionality, which I do believe is a needed requirement.
As for named subqueries, as well as making queries much more readable, IMO they may be particularly valuable as a way to specify particular optimisations (ie sequencing of query execution, that may be absolutely *crucial* if a query is to run) in a particularly readable and **portable** way -- certainly when compared to optimiser "hint" syntaxes, that may be tied *very* specifically to a particular back-end.
Why do I think named subqueries are so portable, if they are not part of the SPARQL 1.1 standard, and most providers don't support them ?
The answer is because if necessary it would require only a fairly simple pre-processor script to turn them into inline sub-queries, which *are* supported by the standard.
Named sub-queries having the advantage though of making the query a lot more readable; and can be useful to indicate to the back-end that the sub-query need only be retrieved once, rather than repeatedly each time it is referenced (which may be helpful for some back-ends).
So: I don't disagree that it would be useful if WDQS was less tightly dependent on Blazegraph.
But: rather than going straight to removing good features, I think there is a lot of scope for seeing whether the dev teams for other back-ends could be persuaded to match the features on those back-ends without too much difficulty; and that this would be a better path to at least investigate, in preference to breaking swathes of queries that are in active use.
-- James.