[Wikidata] Re: Inconsistencies on WDQS data - data reload on WDQS

24 Feb 2023

On 2/23/23 4:17 PM, James Heald wrote:
...
  On 23/02/2023 20:08, Kingsley Idehen via Wikidata
wrote:

 On 2/23/23 12:19 PM, James Heald wrote:

 I have to say I am a bit concerned by this talk, since some of 
 Blazegraph's "features and quirks" can be exceedingly useful.

 That isn't justification for tightly-coupling a Query Tool to a Query 
 Service Endpoint, especially when an open standard (in the form of 
 SPARQL) exists.

 Of course it's a good thing to be able to swap out the back-end and to 
 be able to run essentially the same queries against other realisations 
 of the database.

 It's also a good thing to be able to clone the user interface and use 
 essentially the same UI with a different back-end.  (As I understand 
 it, this should be very possible).

Good to hear, since that's my fundamental point re loosely-coupled 
architecture enabled by open standards.

...

 But. There are features which have been listed in the desiderata for 
 WDQS from the very start, that go beyond what the out-of-the-box 
 SPARQL 1.1 standard offers.

Therein lies the problem. A standards based client can include 
extensions for a specific back-end in configurable form based on 
loose-coupling principles. Doing it otherwise is what's generally known 
as leaky abstraction that ultimately racks up technical debt.

An example of technical debt that's manifesting right now is an 
inability to diffuse the costs of the Wikidata Knowledge Graph across a 
federation of SPARQL query service providers. This doesn't have to be 
the case at all, bearing in mind the nature of SPARQL and structured 
data represented using RDF.

...

 Most notable among these is the ability to retrieve items with 
 coordinates close to a particular point on the earth's surface. 
 (Something which, as the Blazegraph developers discovered, can be 
 implemented fairly easily if you add a "Z-order curve" index on 
 coordinate values  https://en.wikipedia.org/wiki/Z-order_curve ).

None of that would be lost in a WDQS instance configured to discover the 
SPARQL query endpoint and associated capabilities.

...

 Not all users will have an interest in geographical objects. Those who 
 don't will lose little if they hook up a back-end that doesn't provide 
 this, because presumably they won't be running queries which require 
 it. But those who do need this functionality need this indexing.

See my comment above.

...

 Given that this was something the Blazegraph developers (all 3 of 
 them) found they could add relatively easily; and given that it seems 
 to me that any database back-end would gain considerable cachet by 
 being able to run wikidata queries, it seems to me not unreasonable to 
 approach potential alternative back-ends and see how easily they too 
 might be able to add a Z-order curve index for coordinate values, plus 
 basic functionality to make use of it. (Where wikibase:box and 
 wikibase:around are about as basic as it gets).

 Andrea suggested a more GeoSPARQL-orientated solution ( 

https://wikitech.wikimedia.org/wiki/User:AndreaWest/Blazegraph_Features_and…

 ), but that seems to me a much much bigger ask; I do suspect that (for 
 almost all contending projects) the simple wikibase:box and 
 wikibase:around services would be a lot more easily implemented, to 
 free us from our tight-coupling to Blazegraph, yet still provide this 
 functionality, which I do believe is a needed requirement.

 As for named subqueries, as well as making queries much more readable, 
 IMO they may be particularly valuable as a way to specify particular 
 optimisations (ie sequencing of query execution, that may be 
 absolutely *crucial* if a query is to run) in a particularly readable 
 and **portable** way -- certainly when compared to optimiser "hint" 
 syntaxes, that may be tied *very* specifically to a particular back-end.

 Why do I think named subqueries are so portable, if they are not part 
 of the SPARQL 1.1 standard, and most providers don't support them ?

 The answer is because if necessary it would require only a fairly 
 simple pre-processor script to turn them into inline sub-queries, 
 which *are* supported by the standard.

 Named sub-queries having the advantage though of making the query a 
 lot more readable; and can be useful to indicate to the back-end that 
 the sub-query need only be retrieved once, rather than repeatedly each 
 time it is referenced (which may be helpful for some back-ends).

These implementation details aren't really relevant to the fundamental 
point I am trying to make about the virtues of loosely-coupled 
architecture facilitated by existing open standards (e.g., SPARQL).

...

 So: I don't disagree that it would be useful if WDQS was less tightly 
 dependent on Blazegraph.

 But: rather than going straight to removing good features, I think 
 there is a lot of scope for seeing whether the dev teams for other 
 back-ends could be persuaded to match the features on those back-ends 
 without too much difficulty; and that this would be a better path to 
 at least investigate, in preference to breaking swathes of queries 
 that are in active use.

Nothing I've said amounts for feature removal. Everything I've said is 
simply about loosely-coupled architecture as a guiding principle for 
making WDQS usable against other SPARQL endpoints :)

Kingsley

...

    --  James.

 _______________________________________________
 Wikidata mailing list -- wikidata(a)lists.wikimedia.org
 Public archives at 

https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/me…
 To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org

-- 
Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Home Page: http://www.openlinksw.com
Community Support: https://community.openlinksw.com
Weblogs (Blogs):
Company Blog: https://medium.com/openlink-software-blog
Virtuoso Blog: https://medium.com/virtuoso-blog
Data Access Drivers Blog:
https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers

Personal Weblogs (Blogs):
Medium Blog: https://medium.com/@kidehen
Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
               http://kidehen.blogspot.com

Profile Pages:
Pinterest: https://www.pinterest.com/kidehen/
Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter: https://twitter.com/kidehen
Google+: https://plus.google.com/+KingsleyIdehen/about
LinkedIn: http://www.linkedin.com/in/kidehen

Web Identities (WebID):
Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
         : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Wikidata] Re: Inconsistencies on WDQS data - data reload on WDQS