On 2/23/23 4:17 PM, James Heald wrote:
On 23/02/2023 20:08, Kingsley Idehen via Wikidata
wrote:
On 2/23/23 12:19 PM, James Heald wrote:
I have to say I am a bit concerned by this talk, since some of
Blazegraph's "features and quirks" can be exceedingly useful.
That isn't justification for tightly-coupling a Query Tool to a Query
Service Endpoint, especially when an open standard (in the form of
SPARQL) exists.
Of course it's a good thing to be able to swap out the back-end and to
be able to run essentially the same queries against other realisations
of the database.
It's also a good thing to be able to clone the user interface and use
essentially the same UI with a different back-end. (As I understand
it, this should be very possible).
Good to hear, since that's my fundamental point re loosely-coupled
architecture enabled by open standards.
But. There are features which have been listed in the desiderata for
WDQS from the very start, that go beyond what the out-of-the-box
SPARQL 1.1 standard offers.
Therein lies the problem. A standards based client can include
extensions for a specific back-end in configurable form based on
loose-coupling principles. Doing it otherwise is what's generally known
as leaky abstraction that ultimately racks up technical debt.
An example of technical debt that's manifesting right now is an
inability to diffuse the costs of the Wikidata Knowledge Graph across a
federation of SPARQL query service providers. This doesn't have to be
the case at all, bearing in mind the nature of SPARQL and structured
data represented using RDF.
Most notable among these is the ability to retrieve items with
coordinates close to a particular point on the earth's surface.
(Something which, as the Blazegraph developers discovered, can be
implemented fairly easily if you add a "Z-order curve" index on
coordinate values
https://en.wikipedia.org/wiki/Z-order_curve ).
None of that would be lost in a WDQS instance configured to discover the
SPARQL query endpoint and associated capabilities.
Not all users will have an interest in geographical objects. Those who
don't will lose little if they hook up a back-end that doesn't provide
this, because presumably they won't be running queries which require
it. But those who do need this functionality need this indexing.
See my comment above.
Given that this was something the Blazegraph developers (all 3 of
them) found they could add relatively easily; and given that it seems
to me that any database back-end would gain considerable cachet by
being able to run wikidata queries, it seems to me not unreasonable to
approach potential alternative back-ends and see how easily they too
might be able to add a Z-order curve index for coordinate values, plus
basic functionality to make use of it. (Where wikibase:box and
wikibase:around are about as basic as it gets).
Andrea suggested a more GeoSPARQL-orientated solution (
https://wikitech.wikimedia.org/wiki/User:AndreaWest/Blazegraph_Features_and…
), but that seems to me a much much bigger ask; I do suspect that (for
almost all contending projects) the simple wikibase:box and
wikibase:around services would be a lot more easily implemented, to
free us from our tight-coupling to Blazegraph, yet still provide this
functionality, which I do believe is a needed requirement.
As for named subqueries, as well as making queries much more readable,
IMO they may be particularly valuable as a way to specify particular
optimisations (ie sequencing of query execution, that may be
absolutely *crucial* if a query is to run) in a particularly readable
and **portable** way -- certainly when compared to optimiser "hint"
syntaxes, that may be tied *very* specifically to a particular back-end.
Why do I think named subqueries are so portable, if they are not part
of the SPARQL 1.1 standard, and most providers don't support them ?
The answer is because if necessary it would require only a fairly
simple pre-processor script to turn them into inline sub-queries,
which *are* supported by the standard.
Named sub-queries having the advantage though of making the query a
lot more readable; and can be useful to indicate to the back-end that
the sub-query need only be retrieved once, rather than repeatedly each
time it is referenced (which may be helpful for some back-ends).
These implementation details aren't really relevant to the fundamental
point I am trying to make about the virtues of loosely-coupled
architecture facilitated by existing open standards (e.g., SPARQL).
So: I don't disagree that it would be useful if WDQS was less tightly
dependent on Blazegraph.
But: rather than going straight to removing good features, I think
there is a lot of scope for seeing whether the dev teams for other
back-ends could be persuaded to match the features on those back-ends
without too much difficulty; and that this would be a better path to
at least investigate, in preference to breaking swathes of queries
that are in active use.
Nothing I've said amounts for feature removal. Everything I've said is
simply about loosely-coupled architecture as a guiding principle for
making WDQS usable against other SPARQL endpoints :)
Kingsley
-- James.
_______________________________________________
Wikidata mailing list -- wikidata(a)lists.wikimedia.org
Public archives at
https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/me…
To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software
Home Page:
http://www.openlinksw.com
Community Support:
https://community.openlinksw.com
Weblogs (Blogs):
Company Blog:
https://medium.com/openlink-software-blog
Virtuoso Blog:
https://medium.com/virtuoso-blog
Data Access Drivers Blog:
https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
Personal Weblogs (Blogs):
Medium Blog:
https://medium.com/@kidehen
Legacy Blogs:
http://www.openlinksw.com/blog/~kidehen/
http://kidehen.blogspot.com
Profile Pages:
Pinterest:
https://www.pinterest.com/kidehen/
Quora:
https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter:
https://twitter.com/kidehen
Google+:
https://plus.google.com/+KingsleyIdehen/about
LinkedIn:
http://www.linkedin.com/in/kidehen
Web Identities (WebID):
Personal:
http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
:
http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this