Hello,
This is the weekly update from the Search Platform team for the week
starting 2019-06-10.
As always, feedback and questions are welcome.
== Discussions ==
=== Search ===
* Trey has finished his first analysis of a proposal to strip
diacritics in Slovak for searching. Slovak searchers don't always use
them, but removing them causes new problems for stemming. Read more on
MediaWiki or Phabricator —and join the discussion if you speak Slovak!
[0] [1]
* Mathew worked on creating a Cookbook to restart WDQS after applying
updates or doing a deployment [2]
* The team worked with Analytics to port usage of
mediawiki_CirrusSearchRequestSet to mediawiki_cirrussearch_request [3]
* When a new item is created, we have functionality to instant-index
it synchronously, even if with partial information, so it appears
quickly in the index - Stas worked on getting the un-redirected items
to be instant-indexed [4]
* CirrusSearch\SearcherTest::testSearchText PHPUnit tests take a long
time to run and is slowing down change processing in CI - David helped
to make it faster [5]
* It is currently not possible to evaluate all items without labels,
so we investigated adding haslabel:all and determined that matching an
_all field and a descriptions_all field worked well [6]
* We removed old settings ($wgLexemeUseCirrus and
$wgLexemeDisableCirrus) as they were both set to true for production
(no longer applicable / out of date) [7]
* David fixed an issue where suggested article are no longer shown,
API Usage for the morelike search endpoint has dropped since train
deploy of 1.34.0-wmf.7 -- by accident. [8]
* Geodata not being returned for some files - David merged in a few
patches to be sure that we don't lose CoordinatesOutput when multiple
slots are available [9]
* An old request to fix where Special:Search doesn't use labels and
descriptions for suggestions but just the item ID was worked on by
Stas and was enabled [10]
* A PHP notice in MW-Vagrant when searching with CirrusSearch was
happening in the action API and web serach interface. Thanks for
fixing it, Ottomata!
[0] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Folding_Diacritics_i…
[1] https://phabricator.wikimedia.org/T223787#5260102
[2] https://phabricator.wikimedia.org/T221832
[3] https://phabricator.wikimedia.org/T222268
[4] https://phabricator.wikimedia.org/T206256
[5] https://phabricator.wikimedia.org/T225184
[6] https://phabricator.wikimedia.org/T224611
[7] https://phabricator.wikimedia.org/T225183
[8] https://phabricator.wikimedia.org/T224879
[9] https://phabricator.wikimedia.org/T223767
[10] https://phabricator.wikimedia.org/T55652
----
Subscribe to receive on-wiki (or opt-in email) notifications of the
Discovery weekly update.
https://www.mediawiki.org/wiki/Newsletter:Discovery_Weekly
The archive of all past updates can be found on MediaWiki.org:
https://www.mediawiki.org/wiki/Discovery/Status_updates
Interested in getting involved? See tasks marked as "Easy" or
"Volunteer needed" in Phabricator.
[1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R
[2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R
Yours,
Chris Koerner (he/him)
Community Relations Specialist
Wikimedia Foundation
Hello all,
Two things:
1) On Thursday June 13th at 18:00 UTC (11am Pacific) there will be an
open office hours for those of you who would like to share your thoughts
on the event; topics you'd like to see discussed there, decisions you'd
like made, etc.
It will occur using Google Meet, at this url:
https://meet.google.com/exz-zxfy-nuj
If you can't make it to this office hours, don't fret! You can always
(continue to) share your thoughts on the Phabricator task:
https://phabricator.wikimedia.org/T220212
2) REMINDER: The deadline for participant/attendee nominations is Monday
June 17th, this coming Monday. Remember, you can nominate others or
yourself. And you can fill out the form as many times as you have
nominations.
Form: https://forms.gle/CLeGFSMiEasJgEU27
FAQ: https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019/FAQ
This survey is conducted via a third-party service, which may make it
subject to additional terms. For more information on privacy and
data-handling, see this survey privacy statement:
https://foundation.wikimedia.org/wiki/Wikimedia_Technical_Conference_Survey…
On behalf of the Technical Conference Program Committee,
Greg
On Wed, May 29, 2019 at 04:39:37PM -0700, Greg Grossmeier wrote:
> Hello all,
>
> As you may have seen, the next Wikimedia Technical Conference[0] is
> coming up in November 2019.
>
> It will take place November 12-15th in Atlanta, GA (USA). As announced
> at the Hackathon and documented on-wiki[1] this year's event will
> focus on the topic of "Developer Productivity".
>
> Like last year, we are looking for diverse stakeholders, perspectives,
> and experiences that will help us to make informed decisions. We need
> people who can create and architect solutions, as well as those who
> will make funding and prioritization decisions for the projects.
>
> See the FAQ for (hopefully) any questions you have:
> <https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019/FAQ>
>
> Please fill out the survey using this link to nominate yourself or someone
> else to attend: <https://forms.gle/CLeGFSMiEasJgEU27>
>
> This survey is conducted via a third-party service, which may make it
> subject to additional terms. For more information on privacy and
> data-handling, see this survey privacy statement:
> <https://foundation.wikimedia.org/wiki/Wikimedia_Technical_Conference_Survey…>
>
> This nomination form will remain open between May 29 and June 17, 2018.
>
> If you have any questions, please post them on the event's talk page
> <https://www.mediawiki.org/wiki/Talk:Wikimedia_Technical_Conference/2019>.
>
> Thanks!
>
> Greg and the Technical Conference 2019 Program Committee
>
> [0] <https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019>
> [1] <https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019#Vision_S…>
>
> --
> Greg Grossmeier
> Release Team Manager
--
| Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E |
| Release Team Manager A18D 1138 8E47 FAC8 1C7D |
Greetings,
This is the weekly update from the Search Platform team for the week
starting 2019-05-20.
As always, feedback and questions are welcome.
== Highlights==
* Most of the team attended a three-day offsite in Prague last week,
and Deb, Erik, Stas, and Trey also attended the Wikimedia Hackathon.
[0]
== Discussions ==
=== Search ===
* At the Hackathon, we hosted a session on "Advanced search syntax for
newbies" [1]—and we had a few in-depth discussions with volunteers
about search, our APIs, etc., and talked more in-depth about Arabic
and Slovak.
**As a result of our discussion, Trey opened a ticket to investigate
the effects of searching without diacritics in Slovak. [2]
*Trey completed a change to Arabic-language completion suggester
(upper left search box) to make Eastern Arabic Numerals and Western
Arabic Numerals equivalent. [3] It will still take a little while for
the change to be seen on-wiki.
* Stas made a set of preliminary patches to convert CirrusSearch
extension to extension.json registration (merged) and final conversion
patch still in review [4]
* David worked on several tasks to create a fallback method based on a
generic index [5]; making fallback methods configurable [6]; and
allowing the FallbackMethod to create their own SearchQuery [7]
* We noticed that multiple Elasticsearch nodes were getting overloaded
in eqiad in April - Erik patched it and found a few things that might
have caused the issues [8]
* When enabling cross cluster search to support multi-instance we had
to run custom scripts to update cluster settings -- and discovered
that the puppet repo was not aware of this; it's fixed now [9]
* Erik did a smorgasbord of fixes: "missing replica" error messages in
production logs was fixed by uniquely identify connections in
connection pool [10]; create archive indices and delete archive docs
from general indices and to ignore ancient logging rows with log_page
= null [11]; fixed a condition where we received a
cirrusSearchElasticaWrite job for an unwritable cluster cloudelastic
[12]; and documented the CirrusSearch schema [13].
* During the Hackathon, Erik also exposed CloudElastic to the WMF Cloud [14]
=== Wikidata Query Service ===
* At the Hackathon, with the help of Krinkle, the bug with URL
shortener widget being hard to use was fixed [15]
* WDQS bug with label service clauses nested in subqueries being
processed incorrectly was fixed [16]
* Stas fixed breakage in LDF server JSON-LD format [17]
== Did you know? ==
'''Naming Things is Hard, Volume 187:''' The Phab ticket mentioned
above to equate different numeral systems for Arabic-language wikis
uses the names Eastern Arabic Numerals (١٢٣...) and Western Arabic
Numerals (123…). In English, the numerals we usually use (123...) are
often called “Arabic numerals” [18] because they came to Europe from
Arabic sources. In Arabic, the Eastern Arabic Numerals are called
“Indian numerals” [19] because they came from Indian sources. In
English, “Indian numerals” refer to the numerals used in India
(१२३...) but they are just called “Devanagari numerals” in Hindi, for
example. [20] Some have tried to make the subtle distinction in
English that “arabic numerals” are the numerals that came from Arabic
sources (123...), while “Arabic numerals” are the ones that are used
by Arabic speakers (١٢٣...).
It’s also interesting to look at a table of the various related
numeral systems [21] and see the similarities and “false friends”—note
that your fonts may vary: Devanagari 7 looks like a 6 (“७”), Arabic 6
looks like a 7 (“٦”), Gujarati 5 looks like a 4 (“૫”), Bengali 4 looks
like an 8 (“৪”), Gurmukhi 1 looks like a 9 (“੧”), etc. But any of
those systems are MMMDCCXXIV times better than Roman numerals! [22]
[0] https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2019
[1] https://phabricator.wikimedia.org/T216740
[2] https://phabricator.wikimedia.org/T223787
[3] https://phabricator.wikimedia.org/T117217
[4] https://phabricator.wikimedia.org/T87892
[5] https://phabricator.wikimedia.org/T222652
[6] https://phabricator.wikimedia.org/T222152
[7] https://phabricator.wikimedia.org/T221621
[8] https://phabricator.wikimedia.org/T220901
[9] https://phabricator.wikimedia.org/T218932
[10] https://phabricator.wikimedia.org/T222819
[11] https://phabricator.wikimedia.org/T222641
[12] https://phabricator.wikimedia.org/T222307
[13] https://phabricator.wikimedia.org/T220547
[14] https://phabricator.wikimedia.org/T223519
[15] https://phabricator.wikimedia.org/T221127
[16] https://phabricator.wikimedia.org/T153353
[17] https://phabricator.wikimedia.org/T222471
[18] https://en.wikipedia.org/wiki/Arabic_numerals
[19] https://ar.wikipedia.org/wiki/أرقام_هندية
[20] https://hi.wikipedia.org/wiki/देवनागरी_अंक
[21] https://en.wikipedia.org/wiki/Hindu–Arabic_numeral_system#Glyph_comparison
[22] https://en.wikipedia.org/wiki/Roman_numerals
----
Subscribe to receive on-wiki (or opt-in email) notifications of the
Discovery weekly update.
https://www.mediawiki.org/wiki/Newsletter:Discovery_Weekly
The archive of all past updates can be found on MediaWiki.org:
https://www.mediawiki.org/wiki/Discovery/Status_updates
Interested in getting involved? See tasks marked as "Easy" or
"Volunteer needed" in Phabricator.
[1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R
[2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R
Yours,
Chris Koerner (he/him)
Community Relations Specialist
Wikimedia Foundation
I'm going to be doing a run through of my haystack talk
<https://haystackconf.com/2019/variance/> this Thursday at 9:30-10:30 PDT.
The calendar even is scheduled for an hour, but the talk needs to fit in 45
minutes and I'm expecting closer to 35 minutes + questions.
I've invited people currently or previously in discovery, but anyone
interested is free to attend and ask some questions. The URL is pretty
ugly, but this is what the gcal publish event gave me:
https://calendar.google.com/event?action=TEMPLATE&tmeid=M2U5b2llMm5sOWYyM2N…
Erik B.