Re: [Wikitech-l] [Wikidata] Scaling Wikidata Query Service

6 Jun 2019

Thanks, Guillaume - this is very helpful, and it would be great to
have similar information posted/ collected on other kinds of limits
and potential approaches to addressing them.

Some weeks ago, we started a project to keep track of tsuch limits,
and I have added pointers to your information there:
https://www.wikidata.org/wiki/Wikidata:WikiProject_Limits_of_Wikidata .

If anyone is aware of similar discussions for any of the other limits,
please edit that page to include pointers to those discussions.

Thanks!

Daniel

On Thu, Jun 6, 2019 at 9:33 PM Guillaume Lederrey
&lt;glederrey(a)wikimedia.org&gt; wrote:
...

 Hello all!

 There has been a number of concerns raised about the performance and
 scaling of Wikdata Query Service. We share those concerns and we are
 doing our best to address them. Here is some info about what is going
 on:

 In an ideal world, WDQS should:

 * scale in terms of data size
 * scale in terms of number of edits
 * have low update latency
 * expose a SPARQL endpoint for queries
 * allow anyone to run any queries on the public WDQS endpoint
 * provide great query performance
 * provide a high level of availability

 Scaling graph databases is a "known hard problem", and we are reaching
 a scale where there are no obvious easy solutions to address all the
 above constraints. At this point, just "throwing hardware at the
 problem" is not an option anymore. We need to go deeper into the
 details and potentially make major changes to the current architecture.
 Some scaling considerations are discussed in [1]. This is going to take
 time.

 Reasonably, addressing all of the above constraints is unlikely to
 ever happen. Some of the constraints are non negotiable: if we can't
 keep up with Wikidata in term of data size or number of edits, it does
 not make sense to address query performance. On some constraints, we
 will probably need to compromise.

 For example, the update process is asynchronous. It is by nature
 expected to lag. In the best case, this lag is measured in minutes,
 but can climb to hours occasionally. This is a case of prioritizing
 stability and correctness (ingesting all edits) over update latency.
 And while we can work to reduce the maximum latency, this will still
 be an asynchronous process and needs to be considered as such.

 We currently have one Blazegraph expert working with us to address a
 number of performance and stability issues. We
 are planning to hire an additional engineer to help us support the
 service in the long term. You can follow our current work in phabricator [2].

 If anyone has experience with scaling large graph databases, please
 reach out to us, we're always happy to share ideas!

 Thanks all for your patience!

    Guillaume

 [1] https://wikitech.wikimedia.org/wiki/Wikidata_query_service/ScalingStrategy
 [2] https://phabricator.wikimedia.org/project/view/1239/

 --
 Guillaume Lederrey
 Engineering Manager, Search Platform
 Wikimedia Foundation
 UTC+2 / CEST

 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] [Wikidata] Scaling Wikidata Query Service