Re: [Wikidata-tech] Thoughts on (not) exposing a SPARQL endpoint

11 Mar 2015

My basic worries with exposing powerful query languages like SPARQL
publicly is that a) there is a large attack surface in the query processing
backend, and b) a client can request very expensive operations on the
server without performing much work itself. Timeouts can limit the damage,
but if they are set reasonably low (<1 min) they will also eliminate some
of the supposed power of SPARQL, especially if the data set grows at the
rate we all hope for. When reaching the timeout, the client needs to switch
to iterative processing and paging. How well does blazegraph support paging
of complex SPARQL queries without re-calculating the entire result set?

One of the things I like about the MQL design is that they are careful
about identifying a couple of main hierachies (typeOf, geographical
containment, taxonomies, ?) that they can efficiently flatten into
denormalized plain index lookups. These are very fast and easy to page.
...
 From what I have seen so far, they also seem to
directly cover most use cases that people have come up with so far. While perhaps
too limiting in
the longer term, I think such a limited 80/20 design would be a better
starting point for a high-volume public API with strong availability and
response time guarantees. The efficient subset of the API could then be
enriched with more expensive end points over time, but those would
explicitly not have the same performance guarantees as the core API. Those
expensive queries could be executed on a separate cluster / set of machines
to avoid interference with the core API.

Another aspect that I think warrants serious attention for an API is the
complexity and reliability of constructing queries programmatically. As
witnessed by the many issues around seemingly simple languages like SQL,
building up query strings from user-supplied values is easy to get wrong.
It is always possible to build friendly query languages on top of a JSON
API, but it would IMHO be a waste of developer time to repeatedly have to
deal with encoding issues and bugs in each client. This doesn't rule out
SPARQL (it has a JSON encoding), but I think it's a significant
disadvantage of using a custom string syntax like WDQ in the API.

Gabriel

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [Wikidata-tech] Thoughts on (not) exposing a SPARQL endpoint