Thanks, Guillaume - this is very helpful, and it would be great to
have similar information posted/ collected on other kinds of limits
and potential approaches to addressing them.
Some weeks ago, we started a project to keep track of tsuch limits,
and I have added pointers to your information there:
https://www.wikidata.org/wiki/Wikidata:WikiProject_Limits_of_Wikidata .
If anyone is aware of similar discussions for any of the other limits,
please edit that page to include pointers to those discussions.
Thanks!
Daniel
On Thu, Jun 6, 2019 at 9:33 PM Guillaume Lederrey
<glederrey(a)wikimedia.org> wrote:
>
> Hello all!
>
> There has been a number of concerns raised about the performance and
> scaling of Wikdata Query Service. We share those concerns and we are
> doing our best to address them. Here is some info about what is going
> on:
>
> In an ideal world, WDQS should:
>
> * scale in terms of data size
> * scale in terms of number of edits
> * have low update latency
> * expose a SPARQL endpoint for queries
> * allow anyone to run any queries on the public WDQS endpoint
> * provide great query performance
> * provide a high level of availability
>
> Scaling graph databases is a "known hard problem", and we are reaching
> a scale where there are no obvious easy solutions to address all the
> above constraints. At this point, just "throwing hardware at the
> problem" is not an option anymore. We need to go deeper into the
> details and potentially make major changes to the current architecture.
> Some scaling considerations are discussed in [1]. This is going to take
> time.
>
> Reasonably, addressing all of the above constraints is unlikely to
> ever happen. Some of the constraints are non negotiable: if we can't
> keep up with Wikidata in term of data size or number of edits, it does
> not make sense to address query performance. On some constraints, we
> will probably need to compromise.
>
> For example, the update process is asynchronous. It is by nature
> expected to lag. In the best case, this lag is measured in minutes,
> but can climb to hours occasionally. This is a case of prioritizing
> stability and correctness (ingesting all edits) over update latency.
> And while we can work to reduce the maximum latency, this will still
> be an asynchronous process and needs to be considered as such.
>
> We currently have one Blazegraph expert working with us to address a
> number of performance and stability issues. We
> are planning to hire an additional engineer to help us support the
> service in the long term. You can follow our current work in phabricator [2].
>
> If anyone has experience with scaling large graph databases, please
> reach out to us, we're always happy to share ideas!
>
> Thanks all for your patience!
>
> Guillaume
>
> [1] https://wikitech.wikimedia.org/wiki/Wikidata_query_service/ScalingStrategy
> [2] https://phabricator.wikimedia.org/project/view/1239/
>
> --
> Guillaume Lederrey
> Engineering Manager, Search Platform
> Wikimedia Foundation
> UTC+2 / CEST
>
> _______________________________________________
> Wikidata mailing list
> Wikidata(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
Forwarding in case this is of interest.
Pine
( https://meta.wikimedia.org/wiki/User:Pine )
---------- Forwarded message ---------
From: Guillaume Lederrey <glederrey(a)wikimedia.org>
Date: Thu, Jun 6, 2019 at 7:33 PM
Subject: [Wikidata] Scaling Wikidata Query Service
To: Discussion list for the Wikidata project. <wikidata(a)lists.wikimedia.org>
Hello all!
There has been a number of concerns raised about the performance and
scaling of Wikdata Query Service. We share those concerns and we are
doing our best to address them. Here is some info about what is going
on:
In an ideal world, WDQS should:
* scale in terms of data size
* scale in terms of number of edits
* have low update latency
* expose a SPARQL endpoint for queries
* allow anyone to run any queries on the public WDQS endpoint
* provide great query performance
* provide a high level of availability
Scaling graph databases is a "known hard problem", and we are reaching
a scale where there are no obvious easy solutions to address all the
above constraints. At this point, just "throwing hardware at the
problem" is not an option anymore. We need to go deeper into the
details and potentially make major changes to the current architecture.
Some scaling considerations are discussed in [1]. This is going to take
time.
Reasonably, addressing all of the above constraints is unlikely to
ever happen. Some of the constraints are non negotiable: if we can't
keep up with Wikidata in term of data size or number of edits, it does
not make sense to address query performance. On some constraints, we
will probably need to compromise.
For example, the update process is asynchronous. It is by nature
expected to lag. In the best case, this lag is measured in minutes,
but can climb to hours occasionally. This is a case of prioritizing
stability and correctness (ingesting all edits) over update latency.
And while we can work to reduce the maximum latency, this will still
be an asynchronous process and needs to be considered as such.
We currently have one Blazegraph expert working with us to address a
number of performance and stability issues. We
are planning to hire an additional engineer to help us support the
service in the long term. You can follow our current work in phabricator
[2].
If anyone has experience with scaling large graph databases, please
reach out to us, we're always happy to share ideas!
Thanks all for your patience!
Guillaume
[1]
https://wikitech.wikimedia.org/wiki/Wikidata_query_service/ScalingStrategy
[2] https://phabricator.wikimedia.org/project/view/1239/
--
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+2 / CEST
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
Hello!
Just to make this explicit and not hidden in the depth of the
1.27.6/1.30.2/1.31.2/1.32.2 release announcement - MediaWiki 1.27 and 1.30
are
now End-of-Life (EOL) as of Today and are therefore no longer supported.
MediaWiki 1.27 has been slated to become EOL in June 2019 [1], and
therefore the
final release of the MediaWiki 1.27 branch in the form of 1.27.6 solidifies
this.
MediaWiki 1.30 was supposed to be EOL in December 2018 [1], but due to a
lack of
a release since 1.30.1 in September 2018, this hadn't formally happened.
MediaWiki 1.30.2 therefore is the final release for the MediaWiki 1.30
branch.
If you require an LTS version of MediaWiki, please upgrade to MediaWiki 1.31
which is supported until June 2021 [1]. If you don't require LTS support,
you
can upgrade to 1.32 which will be supported till January 2020 [1].
And as somewhat of a heads up, MediaWiki 1.33 is due to be released later
this
month [1].
Thanks!
Sam
[1] https://www.mediawiki.org/wiki/Version_lifecycle
Hi all,
Tomorrow we will be issuing a security and maintenance release to all
supported branches of MediaWiki.
The new releases will be:
1.32.2
1.31.2
1.30.2
1.27.6
This will resolve 12 issues in MediaWiki core, and also includes some
previously committed to git minor security and hardening patches.
Fixes will be available in these respective release branches,
and also master. Tarballs will be available for the above mentioned
point releases as well.
1.30 was due to be previously announced as end of life [1], and as
such 1.30.2 will be the final security and maintenance release
barring any unforeseen issues.
1.27.6 will also be the final release for 1.27 (barring any unforeseen
issues), which is scheduled to become end of life in June 2019 [1].
This security release includes fixes for MediaWiki core.
[1] https://www.mediawiki.org/wiki/Version_lifecycle
---
Sam Reed