Greetings,
This is the weekly update from the Search Platform team for the week
starting 2019-05-20.
As always, feedback and questions are welcome.
== Highlights==
* Most of the team attended a three-day offsite in Prague last week,
and Deb, Erik, Stas, and Trey also attended the Wikimedia Hackathon.
[0]
== Discussions ==
=== Search ===
* At the Hackathon, we hosted a session on "Advanced search syntax for
newbies" [1]—and we had a few in-depth discussions with volunteers
about search, our APIs, etc., and talked more in-depth about Arabic
and Slovak.
**As a result of our discussion, Trey opened a ticket to investigate
the effects of searching without diacritics in Slovak. [2]
*Trey completed a change to Arabic-language completion suggester
(upper left search box) to make Eastern Arabic Numerals and Western
Arabic Numerals equivalent. [3] It will still take a little while for
the change to be seen on-wiki.
* Stas made a set of preliminary patches to convert CirrusSearch
extension to extension.json registration (merged) and final conversion
patch still in review [4]
* David worked on several tasks to create a fallback method based on a
generic index [5]; making fallback methods configurable [6]; and
allowing the FallbackMethod to create their own SearchQuery [7]
* We noticed that multiple Elasticsearch nodes were getting overloaded
in eqiad in April - Erik patched it and found a few things that might
have caused the issues [8]
* When enabling cross cluster search to support multi-instance we had
to run custom scripts to update cluster settings -- and discovered
that the puppet repo was not aware of this; it's fixed now [9]
* Erik did a smorgasbord of fixes: "missing replica" error messages in
production logs was fixed by uniquely identify connections in
connection pool [10]; create archive indices and delete archive docs
from general indices and to ignore ancient logging rows with log_page
= null [11]; fixed a condition where we received a
cirrusSearchElasticaWrite job for an unwritable cluster cloudelastic
[12]; and documented the CirrusSearch schema [13].
* During the Hackathon, Erik also exposed CloudElastic to the WMF Cloud [14]
=== Wikidata Query Service ===
* At the Hackathon, with the help of Krinkle, the bug with URL
shortener widget being hard to use was fixed [15]
* WDQS bug with label service clauses nested in subqueries being
processed incorrectly was fixed [16]
* Stas fixed breakage in LDF server JSON-LD format [17]
== Did you know? ==
'''Naming Things is Hard, Volume 187:''' The Phab ticket mentioned
above to equate different numeral systems for Arabic-language wikis
uses the names Eastern Arabic Numerals (١٢٣...) and Western Arabic
Numerals (123…). In English, the numerals we usually use (123...) are
often called “Arabic numerals” [18] because they came to Europe from
Arabic sources. In Arabic, the Eastern Arabic Numerals are called
“Indian numerals” [19] because they came from Indian sources. In
English, “Indian numerals” refer to the numerals used in India
(१२३...) but they are just called “Devanagari numerals” in Hindi, for
example. [20] Some have tried to make the subtle distinction in
English that “arabic numerals” are the numerals that came from Arabic
sources (123...), while “Arabic numerals” are the ones that are used
by Arabic speakers (١٢٣...).
It’s also interesting to look at a table of the various related
numeral systems [21] and see the similarities and “false friends”—note
that your fonts may vary: Devanagari 7 looks like a 6 (“७”), Arabic 6
looks like a 7 (“٦”), Gujarati 5 looks like a 4 (“૫”), Bengali 4 looks
like an 8 (“৪”), Gurmukhi 1 looks like a 9 (“੧”), etc. But any of
those systems are MMMDCCXXIV times better than Roman numerals! [22]
[0] https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2019
[1] https://phabricator.wikimedia.org/T216740
[2] https://phabricator.wikimedia.org/T223787
[3] https://phabricator.wikimedia.org/T117217
[4] https://phabricator.wikimedia.org/T87892
[5] https://phabricator.wikimedia.org/T222652
[6] https://phabricator.wikimedia.org/T222152
[7] https://phabricator.wikimedia.org/T221621
[8] https://phabricator.wikimedia.org/T220901
[9] https://phabricator.wikimedia.org/T218932
[10] https://phabricator.wikimedia.org/T222819
[11] https://phabricator.wikimedia.org/T222641
[12] https://phabricator.wikimedia.org/T222307
[13] https://phabricator.wikimedia.org/T220547
[14] https://phabricator.wikimedia.org/T223519
[15] https://phabricator.wikimedia.org/T221127
[16] https://phabricator.wikimedia.org/T153353
[17] https://phabricator.wikimedia.org/T222471
[18] https://en.wikipedia.org/wiki/Arabic_numerals
[19] https://ar.wikipedia.org/wiki/أرقام_هندية
[20] https://hi.wikipedia.org/wiki/देवनागरी_अंक
[21] https://en.wikipedia.org/wiki/Hindu–Arabic_numeral_system#Glyph_comparison
[22] https://en.wikipedia.org/wiki/Roman_numerals
----
Subscribe to receive on-wiki (or opt-in email) notifications of the
Discovery weekly update.
https://www.mediawiki.org/wiki/Newsletter:Discovery_Weekly
The archive of all past updates can be found on MediaWiki.org:
https://www.mediawiki.org/wiki/Discovery/Status_updates
Interested in getting involved? See tasks marked as "Easy" or
"Volunteer needed" in Phabricator.
[1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R
[2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R
Yours,
Chris Koerner (he/him)
Community Relations Specialist
Wikimedia Foundation