TL;DR: No concrete issues with SPARQL were mentioned so far; OTOH many
*simple* SPARQL queries are not possible in WDQ; there is still time to
restrict ourselves -- let's give SPARQL a chance before going back.
Hi Daniel,
This discussion is way too abstract. I am missing hard facts about the
claimed problems with SPARQL. Nik and Stas have made a careful analysis
of the options, while your concerns are mostly high-level worries.
You say that SPARQL would lock us into one tool. As I recall, however,
there were at least four different open source SPARQL processors in the
list considered by Nik and Stas (BlazeGraph, Virtuoso, 4Store, Jena).
Moreover, these are partially based on free libraries that do part of
the task (such as query parsing). Of all options considered, this is
clearly the most widely supported one with most tools available. It's
far from perfect, but this is the same with any option we considered.
You also say that SPARQL is too complex. It is true that SPARQL is a
full-featured query language, and that such languages tend to be
complex. Nevertheless, this is hardly an effective argument, given that
we are all using many extremely complex technologies in our day-to-day
work. We need to bring this to a technical level here. If you think a
particular feature or group of features is too complex to support,
please let us know and we can discuss this.
In order to support WDQ, we would already need many features of SPARQL.
If we start with a simplified one-triple-per-statement graph, then I
don't actually see how SPARQL is more complex than WDQ. It is more
verbose (you would have to write something like "P31" or even "wd:P31"
instead of just "31") but this is actually quite useful if you ever want
to be able to query over multiple Wikidata instances (such as future
Commons and Wikidata in one store).
WDQ also has complex features, for good reasons. For example, to find
all things that are instances of subclasses of bridge, you could use the
following query patterns:
WDQ: claim[31:(tree[12280][][279])]
SPARQL: ?X (P31/P279*) Q12280
WDQ can be more concise than SPARQL where it uses pre-defined query
patterns, such as "between" to specify an interval with a single
construct, but this seems to be rather syntactic. The real big
difference between SPARQL and WDQ is that the former has variables in
queries while the latter has not. Both has its merits, but as far as
query languages go, the version with variables is by far the most common.
This difference leads to real restrictions. Here are some examples of
things that you cannot find in WDQ but that are easy to find in SPARQL:
* People who died in the same city that they were born in.
* "Legitimate children" (children of parents who are married to each other)
* People who are their own father (likely an error).
* Cycles in subclass hierarchies.
Already those queries I find worth going to SPARQL for. Most of these
examples do not need any feature other than (AND) pattern matching (the
cycle query needs property path expressions: "?X (P279*) ?X"). Besides
this, I think we need UNION (or), some of the common FILTERs (range
comparisons for dates and numbers), and a geo extension (maybe the most
critical part). Is that really so complex?
Now you can reply: "People can always go to labs to run these queries."
I am not convinced. If we see good use of a technology and have the
means to support it, then we don't serve our users well by restricting
it to an experimental service on labs.
Best wishes,
Markus
On 10.03.2015 18:32, Daniel Kinzler wrote:
Am 10.03.2015 um 18:22 schrieb Thomas Tanon:
I support Magnus point of view. WDQ is a very
good proof of concept but is,
I think, to limited to be the primary language of the Wikidata query
system.
It can be extended. What I want is a limited domain specific language tailored
to our primary use cases. Having it largely compatible with WDQ would be great.
I did not mean to imply that we have to accept the current limitations of WDQ.
I'm arguing that we should impose sensible limitations on queries, instead of
committing to support everything that is possible with SPARQL.
A possible solution is maybe to support two query
languages "as primary": 1
WDQ, at first, in order to have something working quickly 2 A safe subset
of SPARQL (if it is possible) that would be implemented later using the
experience got form the deployment of the first version of the query
system. Or, if it is not possible, an improved version of WDQ that would
break its current limitations.
Absolutely. I'd like to avoid any commitment to keeping the SPARQL interface
stable, though. That's why I'd limit it to labs-based usage.
-- daniel