[Wikidata] Re: Wikidata Query Service scaling update Aug 2021

18 Aug 2021


      If you have new concerns, comments and questions, you can best reach us at
this talk page
Oops, sorry, this is the link that should actually work:
https://www.wikidata.org/w/index.php?title=Wikidata_talk:Query_Service_scali...
—
*Mike Pham* (he/him)
Sr Product Manager, Search
Wikimedia Foundation https://wikimediafoundation.org/
On 18August, 2021 at 16:07:01, Mike Pham (mpham@wikimedia.org) wrote:
Wikidata community members,
Thank you for all of your work helping Wikidata grow and improve over the
years. In the spirit of better communication, we would like to take this
opportunity to share some of the current challenges Wikidata Query Service
(WDQS) is facing, and some strategies we have for dealing with them.
WDQS currently risks failing to provide acceptable service quality due to
the following reasons:
1.
Blazegraph scaling
   1.
Graph size. WDQS uses Blazegraph as our graph backend. While
      Blazegraph can theoretically support 50 billion edges
      https://blazegraph.com/, in reality Wikidata is the largest graph
      we know of running on Blazegraph (~13 billion triples
      https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&refresh=1m),
      and there is a risk that we will reach a size
      https://www.w3.org/wiki/LargeTripleStores#Bigdata.28R.29_.2812.7B.29limit
      of what it can realistically support
      https://phabricator.wikimedia.org/T213210. Once Blazegraph is maxed
      out, WDQS can no longer be updated. This will also break Wikidata tools
      that rely on WDQS.
      2.
Software support. Blazegraph is end of life software, which is no
      longer actively maintained, making it an unsustainable backend
to continue
      moving forward with long term.
Blazegraph maxing out in size poses the greatest risk for catastrophic
failure, as it would effectively prevent WDQS from being updated further,
and inevitably fall out of date. Our long term strategy to address this is
to move to a new graph backend that best meets our WDQS needs and is
actively maintained, and begin the migration off of Blazegraph as soon as a
viable alternative is identified https://phabricator.wikimedia.org/T206560
.
In the interim period, we are exploring disaster mitigation options for
reducing Wikidata’s graph size in the case that we hit this upper graph
size limit: (i) identify and delete lower priority data (e.g. labels,
descriptions, aliases, non-normalized values, etc); (ii) separate out
certain subgraphs (such as Lexemes and/or scholarly articles). This would
be a last resort scenario to keep Wikidata and WDQS running with reduced
functionality while we are able to deploy a more long-term solution.
1.
Update and access scaling
   1.
Throughput. WDQS is currently trying to provide fast updates, and
      fast unlimited queries for all users. As the number of SPARQL queries
      grows over time
      https://www.mediawiki.org/wiki/User:MPopov_(WMF)/Wikimania_2021_Hackathonalongside
      graph updates, WDQS is struggling to sufficiently keep up
      https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=8&from=now-6M&to=now&refresh=1d
      in each dimension of service quality without compromising anywhere.  For
      users, this often leads to timed out queries.
      2.
Equitable service. We are currently unable to adjust system behavior
      per user/agent. As such, it is not possible to provide equitable
service to
      users: for example, a heavy user could swamp WDQS enough to hinder
      usability by community users.
In addition to being a querying service for Wikidata, WDQS is also part of
the edit pipeline of Wikidata (every edit on Wikidata is pushed to WDQS to
update the data there). While deploying the new Flink-based Streaming
Updater https://phabricator.wikimedia.org/T244590 will help with
increasing throughput of Wikidata updates, there is a substantial risk that
WDQS will be unable to keep up with the combination of increased querying
and updating, resulting in more tradeoffs between update lag and querying
latency/timeouts.
In the near term, we would like to work more closely with you to determine
what acceptable trade-offs would be for preserving WDQS functionality while
we scale up Wikidata querying. In the long term, we will be conducting more
user research to better understand your needs so we can (i) optimize
querying via SPARQL and/or other methods, (ii) explore better user
management that will allow us to prevent heavy use of WDQS that does not
align with the goals of our movement and projects, and (iii) make it easier
for users to set up and run their own query services.
Though this information about the current state of WDQS may not be a total
surprise to many of you, we want to be as transparent with you as possible
to ensure that there are as few surprises as possible in the case of any
potential service disruptions/catastrophic failures, and that we can
accommodate your work as best as we can in the future evolution of WDQS. We
plan on doing a session on WDQS scaling challenges during WikidataCon this
year at the end of October.
Thanks for your understanding with these scaling challenges, and for any
feedback you have already been providing. If you have new concerns,
comments and questions, you can best reach us at this talk page
https://www.wikidata.org/wiki/Wikidata_talk:Query_Service_scaling_update_Aug_2021.
Additionally, if you have not had a chance to fill out our survey
https://docs.google.com/forms/d/e/1FAIpQLSe1H_OXQFDCiGlp0QRwP6-Z2CGCgm96MWBBmiqsMLu0a6bhLg/viewform?usp=sf_link
yet, please tell us how you use the Wikidata Query Service (see privacy
statement
https://foundation.wikimedia.org/wiki/WDQS_User_Survey_2021_Privacy_Statement)!
Whether you are an occasional user or create tools, your feedback is needed
to decide our future development.
Best,
WMF Search + WMDE

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Wikidata] Re: Wikidata Query Service scaling update Aug 2021