Nice demo, Stas! :)
--
deb tankersley
irc: debt
Product Manager, Discovery
Wikimedia Foundation
---------- Forwarded message ----------
From: Adam Baso <abaso(a)wikimedia.org>
Date: Wed, Aug 2, 2017 at 12:10 PM
Subject: [Wikitech-l] Today's CREDIT demo - Wikidata Query Service update,
including on federation
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Hi all - just one demo today, but as always it's a treat to see Stas's
updates on Wikidata Query Service (WDQS).
Enjoy!
https://www.youtube.com/watch?v=zfjY9JU0NR0https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual
-Adam
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Thanks to Erica Litrenta for sharing this with me. I thought I'd share if
forward.
"It was because of the letter K that I found my younger sister, but for 14
years, it was also the letter K that kept us apart."
https://www.wired.com/story/search-algorithms-kept-me-from-my-sister-for-14…
Yours,
Chris Koerner
Community Liaison
Wikimedia Foundation
Hello, This is the Discovery update for last week. Apologies for the
delay in getting it out.
== Discussions ==
=== Search ===
* Created a method for the Kafka consumer to take 'learn to rank'
queries from a queue and run them against ElasticSearch to generate
relevance labels [0]
* Added in the ability to use kafka in our LTRank feature generation
queries and pushing them into ElasticSearch for analysis [1]
* Added ability to extract TF and IDF based features in the
ElasticSearch 'learning to rank' plugin [2]
* A/B test still in progress 'explore similar' links, but we're
running into a few bugs that will be sorted out next week [3]
* Fixed a bug where searching for phrase queries did not highlight
page content [4]
=== Analysis ===
* Fixed a bug with the sister project snippets and eventlogging [5]
* Finished up analysis for determining what is a reasonable per-IP
ratelimit for maps [6]
* Fixed a minor dashboard bug (splines) [7]
[0] https://phabricator.wikimedia.org/T162059
[1] https://phabricator.wikimedia.org/T162072
[2] https://phabricator.wikimedia.org/T167437
[3] https://phabricator.wikimedia.org/T164856
[4] https://phabricator.wikimedia.org/T167798
[5] https://phabricator.wikimedia.org/T168916
[6] https://phabricator.wikimedia.org/T169175
[7] https://phabricator.wikimedia.org/T169125
Yours,
Chris Koerner
Community Liaison
Wikimedia Foundation
Very interesting!
10. jul. 2017 16:34 skrev "Chris Koerner" <ckoerner(a)wikimedia.org>:
Thanks to Erica Litrenta for sharing this with me. I thought I'd share if
forward.
"It was because of the letter K that I found my younger sister, but for 14
years, it was also the letter K that kept us apart."
https://www.wired.com/story/search-algorithms-kept-me-
from-my-sister-for-14-years
Yours,
Chris Koerner
Community Liaison
Wikimedia Foundation
_______________________________________________
discovery mailing list
discovery(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery
Hello!
We've had a significant slowdown of elasticsearch today (see Grafana
for exact timing [1]). The impact was low enough that it probably does
not require a full incident report (the number of errors did not raise
significantly [2]), but understanding what happened and sharing that
understanding is important. This is going to be a long and technical
email, you might get bored, feel free to close it and delete it right
now.
TL;DR: elastic1019 was overloaded, having too many heavy shards,
banning all shards from elastic1019 to reshuffle allowed it to
recover.
In more details:
elastic1019 was hosting shards for commonswiki, enwiki and frwiki,
which are all high load shards. elastic1019 is one of our older
server, which are less powerfull, and might also suffer from CPU
overheating [3].
The obvious question: "why do we even allow multiple heavy shards to
be allocated on the same node?". The answer is obvious as well: "it's
complicated...".
One of the very interesting feature of elasticsearch is its ability to
automatically balance shards. This allows the cluster to automatically
rebalance in case nodes are lost, and to automatically balance shards
to spread resource usage across all nodes in the cluster [4].
Constraints can be added to account for available disk space [5], rack
awareness [6], or even have specific filtering for specific indices
[7]. It does not directly allow to constraint allocation based on the
load of a specific shard.
We do have a few mechanism to ensure that load is as uniform as
possible on the cluster:
An index is split in multiple shards, a shard is replicated multiple
times to provide redundancy and to spread load. Those are configured
by index.
We know which are the heavy indices (commons, enwiki, frwiki, ...),
both in term of size and in term of traffic. Those indices are split
in a number of shards+replicas close to the number of nodes in the
cluster, to ensure that those shards are spread evenly on the cluster,
with only a few shards of the same index on the same node, but still
allow to loose a few nodes and keep all shards allocated. For example,
enwiki_content has 8 shards, with 2 replicas each, so a total number
of 24 shards, with a maximum of 2 shards on the same node. This
approach works well most of the time.
The limitation is that a shard is a "scalability unit", you can't move
around something smaller than a shard. In the case of enwiki, a single
shard is ~40Go and a fairly large number of requests per second. If
you have a node that has just one more of those shards, that's already
a significant amount of additional load.
The solution could be to split large indices in a lot more shards, the
scalability unit would be much smaller, and it would be much easier to
have a uniform load. Of course, there are also limitations. The total
number of shards in the cluster has a significant cost. Increasing it
will add load to cluster operations (which are already quite expensive
in with the total number of shards we have at this point). There are
also functional issues: ranking (BM25) uses statistics calculated per
shard, with smaller shards at some point the stats might not be
relevant of the whole corpus.
There are probably a lot more detail we could get into, feel free to
ask more questions and we can continue the conversation. And I'm sure
David and Erik have a lot to add!
Thanks for reading to the end!
Guillaume
[1] https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?orgId=…
[2] https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?orgId=…
[3] https://phabricator.wikimedia.org/T168816
[4] https://www.elastic.co/guide/en/elasticsearch/reference/current/shards-allo…
[5] https://www.elastic.co/guide/en/elasticsearch/reference/current/disk-alloca…
[6] https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-…
[7] https://www.elastic.co/guide/en/elasticsearch/reference/current/shard-alloc…
--
Guillaume Lederrey
Operations Engineer, Discovery
Wikimedia Foundation
UTC+2 / CEST