To be fair, the discussion is not "what will we do till the end of time", rather "what do we start with".

Knowing neither SPARQL nor the data storage engine terribly well, it would not be helpful if the service can be DOSed by innocent-looking queries, intentional or not. Exposing only a subset of SPARQL (in this case, via WDQ wrapper) initially would be a way to test the waters. A proper SPARQL API can be exposed at any time later, once we're confident it will hold up.

This seems more like a technical decision in terms of "operational security", rather than a philosophical one about the merits of query languages (where SPARQL is undoubtedly more powerful than WDQ).

On Tue, Mar 10, 2015 at 10:17 PM Markus Krötzsch <markus@semantic-mediawiki.org> wrote:
TL;DR: No concrete issues with SPARQL were mentioned so far; OTOH many
*simple* SPARQL queries are not possible in WDQ; there is still time to
restrict ourselves -- let's give SPARQL a chance before going back.


Hi Daniel,

This discussion is way too abstract. I am missing hard facts about the
claimed problems with SPARQL. Nik and Stas have made a careful analysis
of the options, while your concerns are mostly high-level worries.

You say that SPARQL would lock us into one tool. As I recall, however,
there were at least four different open source SPARQL processors in the
list considered by Nik and Stas (BlazeGraph, Virtuoso, 4Store, Jena).
Moreover, these are partially based on free libraries that do part of
the task (such as query parsing). Of all options considered, this is
clearly the most widely supported one with most tools available. It's
far from perfect, but this is the same with any option we considered.

You also say that SPARQL is too complex. It is true that SPARQL is a
full-featured query language, and that such languages tend to be
complex. Nevertheless, this is hardly an effective argument, given that
we are all using many extremely complex technologies in our day-to-day
work. We need to bring this to a technical level here. If you think a
particular feature or group of features is too complex to support,
please let us know and we can discuss this.

In order to support WDQ, we would already need many features of SPARQL.
If we start with a simplified one-triple-per-statement graph, then I
don't actually see how SPARQL is more complex than WDQ. It is more
verbose (you would have to write something like "P31" or even "wd:P31"
instead of just "31") but this is actually quite useful if you ever want
to be able to query over multiple Wikidata instances (such as future
Commons and Wikidata in one store).

WDQ also has complex features, for good reasons. For example, to find
all things that are instances of subclasses of bridge, you could use the
following query patterns:

WDQ: claim[31:(tree[12280][][279])]
SPARQL: ?X (P31/P279*) Q12280

WDQ can be more concise than SPARQL where it uses pre-defined query
patterns, such as "between" to specify an interval with a single
construct, but this seems to be rather syntactic. The real big
difference between SPARQL and WDQ is that the former has variables in
queries while the latter has not. Both has its merits, but as far as
query languages go, the version with variables is by far the most common.

This difference leads to real restrictions. Here are some examples of
things that you cannot find in WDQ but that are easy to find in SPARQL:

* People who died in the same city that they were born in.
* "Legitimate children" (children of parents who are married to each other)
* People who are their own father (likely an error).
* Cycles in subclass hierarchies.

Already those queries I find worth going to SPARQL for. Most of these
examples do not need any feature other than (AND) pattern matching (the
cycle query needs property path expressions: "?X (P279*) ?X"). Besides
this, I think we need UNION (or), some of the common FILTERs (range
comparisons for dates and numbers), and a geo extension (maybe the most
critical part). Is that really so complex?

Now you can reply: "People can always go to labs to run these queries."
I am not convinced. If we see good use of a technology and have the
means to support it, then we don't serve our users well by restricting
it to an experimental service on labs.

Best wishes,

Markus


On 10.03.2015 18:32, Daniel Kinzler wrote:
> Am 10.03.2015 um 18:22 schrieb Thomas Tanon:
>> I support Magnus point of view. WDQ is a very good proof of concept but is,
>> I think, to limited to be the primary language of the Wikidata query
>> system.
>
> It can be extended. What I want is a limited domain specific language tailored
> to our primary use cases. Having it largely compatible with WDQ would be great.
>
> I did not mean to imply that we have to accept the current limitations of WDQ.
> I'm arguing that we should impose sensible limitations on queries, instead of
> committing to support everything that is possible with SPARQL.
>
>> A possible solution is maybe to support two query languages "as primary": 1
>> WDQ, at first, in order to have something working quickly 2 A safe subset
>> of SPARQL (if it is possible) that would be implemented later using the
>> experience got form the deployment of the first version of the query
>> system. Or, if it is not possible, an improved version of WDQ that would
>> break its current limitations.
>
> Absolutely. I'd like to avoid any commitment to keeping the SPARQL interface
> stable, though. That's why I'd limit it to labs-based usage.
>
> -- daniel
>


_______________________________________________
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech