Re: [Wikidata] Wikidata Query Service update lag

14 Nov 2019

As the Wikitech WDQS Hardware section [1] explains, “due to how we route
traffic with GeoDNS, the primary cluster (usually eqiad) sees most of
the traffic.” So the clusters may all have the same hardware, but one
cluster sees most of the query load, so it has a harder time keeping up
with updates (since the update load is mostly the same everywhere).

Cheers,
Lucas

[1]: https://wikitech.wikimedia.org/wiki/Wikidata_query_service#Hardware

On 14.11.19 11:10, fn(a)imm.dtu.dk wrote:
...

 Besides waiting for the new updater, it may be useful to tell us, what
 we as users can do too. It is unclear to me what the problem is. For
 instance, at one point I was worried that the many parallel requests
 to the SPARQL endpoint that we make in Scholia is a problem. As far as
 I understand it is not a problem at all. Another issue could be the
 way that we use Magnus Manske's Quickstatements and approve bots for
 high frequency editing. Perhaps a better overview and constraints on
 large-scale editing could be discussed?

 Yet another thought is the large discrepancy between Virginia and
 Texas data centers as I could see on Grafana [1]. As far as I
 understand the hardware (and software) are the same. So why is there
 this large difference? Rather than editing or BlazeGraph, could the
 issue be some form of network issue?

 [1]

https://grafana.wikimedia.org/d/000000489/wikidata-query-service?panelId=8&…

 /Finn

 On 14/11/2019 10:50, Guillaume Lederrey wrote:
  Hello all!

 As you've probably noticed, the update lag on the public WDQS
 endpoint [1] is not doing well [2], with lag climbing to > 12h for
 some servers. We are tracking this on phabricator [3], subscribe to
 that task if you want to stay informed.

 To be perfectly honest, we don't have a good short term solution. The
 graph database that we are using at the moment (Blazegraph [4]) does
 not easily support sharding, so even throwing hardware at the problem
 isn't really an option.

 We are working on a few medium term improvements:

 * A dedicated updater service in Blazegraph, which should help
 increase the update throughput [5]. Finger crossed, this should be
 ready for initial deployment and testing by next week (no promise,
 we're doing the best we can).
 * Some improvement in the parallelism of the updater [6]. This has
 just been identified. While it will probably also provide some
 improvement in throughput, we haven't actually started working on
 that and we don't have any numbers at this point.

 Longer term:

 We are hiring a new team member to work on WDQS. It will take some
 time to get this person up to speed, but we should have more capacity
 to address the deeper issues of WDQS by January.

 The 2 main points we want to address are:

 * Finding a triple store that scales better than our current solution.
 * Better understand what are the use cases on WDQS and see if we can
 provide a technical solution that is better suited. Our intuition is
 that some of the use cases that require synchronous (or quasi
 synchronous) updates would be better implemented outside of a triple
 store. Honestly, we have no idea yet if this makes sense and what
 those alternate solutions might be.

 Thanks a lot for your patience during this tough time!

     Guillaume

 [1] https://query.wikidata.org/
 [2]

https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&am…
 [3] https://phabricator.wikimedia.org/T238229
 [4] https://blazegraph.com/
 [5] https://phabricator.wikimedia.org/T212826
 [6] https://phabricator.wikimedia.org/T238045

 -- 
 Guillaume Lederrey
 Engineering Manager, Search Platform
 Wikimedia Foundation
 UTC+1 / CET

 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Wikidata Query Service update lag