Re: [Wikidata-tech] Thoughts on (not) exposing a SPARQL endpoint

10 Mar 2015

Am 10.03.2015 um 16:47 schrieb Markus Krötzsch:
...
  Hi Daniel,

 I can understand your thoughts to some extent, but they seem to apply to any
 potential solution. Committing to a primary query interface will always be,
 well, a committment. Because of this, I think the big advantage of SPARQL is
 exactly that it is a technology standard that is not depending on a specific
 tool. If you want to minimize lock-in and be maximally future-safe, this seems
 to be a good thing. 
Committing the the broadest possible interface, even if it's a standard, is the
problem I see, because it makes swapping out the backend close to impossible. I
propose committing to an interface that is as narrow as it can be for our use
case. That's general best practice in system design, I believe.

Note that we are not only committing to a (standardized, but very complex) query
language, but also to our data mapping. WDQ would abstract from that, and give
us wiggle room to adjust the mapping later.

...
  I would certainly not support the use of a
 tool-specific query language that is not specified anywhere but in running code. 
Of course the language would need to be well specified, and modified in places.
We'd want a production grammar, and a decent parser (recursive descend, probably).

...
  WDQ is great but it is a custom API of a single tool
rather
 than a query language. 
It would be our Domain Specific Language. There's a lot to be said for DSLs, if
they are well documented.

...
  * "WDQ would go away": That's not a
worry I have at all. It will be easy to
 write an adaptor for WDQ to SPARQL and to keep up the service as it is for a
 long time. 
That is exactly what I'm proposing. I'd just say that the WDQ version would the
canonical one, while the SPARQL one would be considered raw/unstable, like the
SQL databases on labs.

...
  * "SPARQL would be too expressive, or could have
non-standard extensions that
 are hard to support in the future": This can be addressed in two ways. The soft
 way is to document clearly which features are supported, and to maintain
 backwards compatibility only wrt these.  
This documentation is unlikely to be complete, and people will use what ever
"works now", and complain when it breaks. They *will* use vendor specific
features and optimizations, even if you tell them they shouldn't. And there will
be trouble when they break.

...
  The hard (as in firm, not as in
 difficult) way is to restrict queries to use only such a limited set of "safe"
 features. This is easy to do, since SPARQL query parsing and reformulation is
 already part of any DBMS that supports such queries, and it would be easy to
 hook into this process to restrict queries without any notable performance
 overhead. This would minimize vendor lock-in, since one would only commit to (a
 subset of) the fully standardized features. 
That is the plan for sandboxing SPARQL. It's doable, but not easy. Implementing
"safe" WDQ on top of SPARQL is going to be simpler and quicker, I think. It
will
give us a public query interface *faster*.

...
  With both of these in place, your concerns should be
addressed without us having
 to build our own query language from scratch (including parsers, preprocessors,
 optimizers, user documentation, ...).  
With WDQ on to of SPARQL, we need a parser and a SPARQL emitter, that's it.
Documentation is already there (well, to a degree), and optimization is provided
by the SPARQL endpoint.

...
  Moreover, both of these can be added at
 any stage of the project, so we are not blocked now by having to decide all of
 these details. Right now the main priority should be to get something running
 rather than to go back to the drawing board. 
Yes, absolutely, but what we make available publicly
1) has to be safe - I believe this is easier and faster to do with WDQ.
2) should be future proof - again, easier with WDQ, because it's more
restrictive and domain specific. It allows us to change the underlying mapping
or technology. SPARQL doesn't easily.

In any case, I'm not saying we shouldn't make a SPARQL endpoint available at
all. I'm saying it should not be the canonical query interface, but rather a
"raw" query interface. That would give us a lot more headroom to change things
later, without breaking a lot of 3rd party code.

-- daniel

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [Wikidata-tech] Thoughts on (not) exposing a SPARQL endpoint