Hello again. It's been a while.
This is the weekly update from the Search Platform team for the week starting 2018-09-17 through 2018-10-01.
As always, feedback and questions welcome.
== Discussions ==
=== Search === * Implemented indexing statement values as part of main data in Wikidata, so that statement values are now searchable without special syntax [0] * Reindexed wikidata which also enables qualifier indexing [1] * Mathew worked on resolving an elasticsearch shard size alert by doing an in place reindex [2] * There was a lot of work done to investigate a brief outage of CirrusSearch (mw exception spike for api.php) [3], but it's resolved enough for now. * Gehel and others worked on refactoring puppet to support multiple elasticsearch instances on same node [4] * Erik worked on an issue where the text content of wiki page in search index can merge words making them unfindable [5] * Stas updated the search engine of Wikidata to enable searching by author name string [6] * David and Erik worked together on evaluating adding an image quality score to media search result ranking [7] * Stas added X-Search-Id to WikidataCompletionSearchClicks events [8] * David added a way to configure timeouts of autocomplete queries [9] * Erik upgraded saneitizer to constantly re-index documents [10] * David investigated why interwiki cache hit/miss was no longer reported (since 2017) and decided to drop the support for caching interwiki queries [11] * Mathew and Gehel worked on raising the alert level on disk space for old elasticsearch servers [12] * Erik worked to correct issues where the Cirrus MLT cache had a 0% hit rate on switchover [13]
=== WDQS ===
* Added new NTriples RDF dump (which makes it easier to do per-line processing) [14] * Internal cluster switched to Kafka events as change source, public cluster next [15]
== Did you know? == * Different languages can have a different number of sounds they use; the set of sounds used in a particular language is called its “phonemic inventory”. [16] The numbers of sounds can range from 11 to over 140! Having more sounds than letters, or different sounds than the usual sound associated with a letter, can be the source of unusual orthographies and/or transliteration schemes—including "q" formerly being used as a vowel in Natqgu (now Natügu), a language of the Solomon Islands.
[0] https://phabricator.wikimedia.org/T163642 [1] https://phabricator.wikimedia.org/T193407 [2] https://phabricator.wikimedia.org/T204362 [3] https://phabricator.wikimedia.org/T204776 [4] https://phabricator.wikimedia.org/T198351 [5] https://phabricator.wikimedia.org/T195389 [6] https://phabricator.wikimedia.org/T179815 [7] https://phabricator.wikimedia.org/T202339 [8] https://phabricator.wikimedia.org/T205597 [9] https://phabricator.wikimedia.org/T204959 [10] https://phabricator.wikimedia.org/T203622 [11] https://phabricator.wikimedia.org/T191961 [12] https://phabricator.wikimedia.org/T204361 [13] https://phabricator.wikimedia.org/T204148 [14] https://phabricator.wikimedia.org/T144103 [15] https://phabricator.wikimedia.org/T189458 [16] https://en.wikipedia.org/wiki/Phonemic_inventory
----
Subscribe to receive on-wiki (or opt-in email) notifications of the Discovery weekly update.
https://www.mediawiki.org/wiki/Newsletter:Discovery_Weekly
The archive of all past updates can be found on MediaWiki.org:
https://www.mediawiki.org/wiki/Discovery/Status_updates
Interested in getting involved? See tasks marked as "Easy" or "Volunteer needed" in Phabricator.
[1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R [2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R
Yours, Chris Koerner Community Relations Specialist Wikimedia Foundation