Hello again. It's been a while.
This is the weekly update from the Search Platform team for the week
starting 2018-09-17 through 2018-10-01.
As always, feedback and questions welcome.
== Discussions ==
=== Search ===
* Implemented indexing statement values as part of main data in
Wikidata, so that statement values are now searchable without special
syntax [0]
* Reindexed wikidata which also enables qualifier indexing [1]
* Mathew worked on resolving an elasticsearch shard size alert by
doing an in place reindex [2]
* There was a lot of work done to investigate a brief outage of
CirrusSearch (mw exception spike for api.php) [3], but it's resolved
enough for now.
* Gehel and others worked on refactoring puppet to support multiple
elasticsearch instances on same node [4]
* Erik worked on an issue where the text content of wiki page in
search index can merge words making them unfindable [5]
* Stas updated the search engine of Wikidata to enable searching by
author name string [6]
* David and Erik worked together on evaluating adding an image quality
score to media search result ranking [7]
* Stas added X-Search-Id to WikidataCompletionSearchClicks events [8]
* David added a way to configure timeouts of autocomplete queries [9]
* Erik upgraded saneitizer to constantly re-index documents [10]
* David investigated why interwiki cache hit/miss was no longer
reported (since 2017) and decided to drop the support for caching
interwiki queries [11]
* Mathew and Gehel worked on raising the alert level on disk space for
old elasticsearch servers [12]
* Erik worked to correct issues where the Cirrus MLT cache had a 0%
hit rate on switchover [13]
=== WDQS ===
* Added new NTriples RDF dump (which makes it easier to do per-line
processing) [14]
* Internal cluster switched to Kafka events as change source, public
cluster next [15]
== Did you know? ==
* Different languages can have a different number of sounds they use;
the set of sounds used in a particular language is called its
“phonemic inventory”. [16] The numbers of sounds can range from 11 to
over 140! Having more sounds than letters, or different sounds than
the usual sound associated with a letter, can be the source of unusual
orthographies and/or transliteration schemes—including "q" formerly
being used as a vowel in Natqgu (now Natügu), a language of the
Solomon Islands.
[0]
https://phabricator.wikimedia.org/T163642
[1]
https://phabricator.wikimedia.org/T193407
[2]
https://phabricator.wikimedia.org/T204362
[3]
https://phabricator.wikimedia.org/T204776
[4]
https://phabricator.wikimedia.org/T198351
[5]
https://phabricator.wikimedia.org/T195389
[6]
https://phabricator.wikimedia.org/T179815
[7]
https://phabricator.wikimedia.org/T202339
[8]
https://phabricator.wikimedia.org/T205597
[9]
https://phabricator.wikimedia.org/T204959
[10]
https://phabricator.wikimedia.org/T203622
[11]
https://phabricator.wikimedia.org/T191961
[12]
https://phabricator.wikimedia.org/T204361
[13]
https://phabricator.wikimedia.org/T204148
[14]
https://phabricator.wikimedia.org/T144103
[15]
https://phabricator.wikimedia.org/T189458
[16]
https://en.wikipedia.org/wiki/Phonemic_inventory
----
Subscribe to receive on-wiki (or opt-in email) notifications of the
Discovery weekly update.
https://www.mediawiki.org/wiki/Newsletter:Discovery_Weekly
The archive of all past updates can be found on
MediaWiki.org:
https://www.mediawiki.org/wiki/Discovery/Status_updates
Interested in getting involved? See tasks marked as "Easy" or
"Volunteer needed" in Phabricator.
[1]
https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R
[2]
https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R
Yours,
Chris Koerner
Community Relations Specialist
Wikimedia Foundation