Using Wikidata Query Service in a production context - Discovery

11 Oct 2016


      Hello!
There is some discussion of starting to use WDQS in conjunction with
maps and graphs. Here are a few thoughts, just to put them out there
and to start getting some feedback. This is an attempt to put some
order in my thoughts, there are not complete yet...
WDQS exposes a SPARQL endpoint to users. This can be compared to
giving the ability to our users to write arbitrary SQL queries. This
is fairly close to the concept of the labs replica databases. Giving
direct access to a SPARQL endpoint is at the same time a wonderful
idea (it allows users to use WDQS in ways we would never have imagine)
and a very scary idea (users can write complex queries which will
consume all resources on our servers - which does happen from time to
time).
At the moment, WDQS is used by researcher, bots and power users. Those
users understand this constraint well, and the fluctuation of
performance of WDQS is not a major issue.
Making WDQS robust enough while letting user run arbitrary queries is
most probably extremely hard. I think that we should instead
investigate how to use an unstable service from a stable one.
Ideas...
1) We can accept service degradation of specific functionalities. We
accept that WDQS is down, or slow some times. In this case, we degrade
user experience, graphs will not work, maps will not display data
layers. In term of implementation, we need to ensure that data flows
involving WDQS do not go through any critical systems, and that all
direct clients of WDQS are well protected by circuit breakers.
2) We want to conserve user experience. We go fully async. Graphs and
maps are pre-generated and updated regularly outside of user
interaction. We probably still need synchronous access for editors, to
allow them to test their edits. Refresh can be relatively low
frequency (1/day or maybe less). We can probably optimize this based
on how often a specific graph / map is viewed. I'm not sure how easy
it would be to scale such an approach...
3) Something else?
Time to get some sleep...
MrG
-- 
Guillaume Lederrey
Operations Engineer, Discovery
Wikimedia Foundation
UTC+2 / CEST