The Search Platform Team
<https://www.mediawiki.org/wiki/Wikimedia_Search_Platform> usually holds
office hours the first Wednesday of each month. Come ask us anything about
Wikimedia search!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting (note that we're on daylight savings time, so
the time has shifted relative to GMT):
Date: Wednesday, April 3rd, 2019
Time: 15:00-16:00 GMT / 08:00-9:00 PDT / 11:00-12:00 EDT / 17:00-18:00 CEST
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vyc-jvgq-dww
*N.B.:* Google Meet System Requirements
<https://support.google.com/meet/answer/7317473>
Hope to talk to you in a week!
Trey Jones
Sr. Software Engineer, Search Platform
Wikimedia Foundation
Hi,
I've put a couple pages on mw.org to start discussing about the problem we
are currently facing as to how extensions can interact with CirrusSearch
without stepping on each other toes.
To describe the problem briefly:
some extension provides additional data to the wiki structure (generally
through custom Content Handlers) and CirrusSearch itself is not aware of
the best ways to take benefits of this additional data to ameliorate
ranking or results display.
Cirrus has provided (organically) various hooks to let us build what we
have today:
- wikidata search integration for Entities, Properties & Lexemes
- search keywords
But as more and more integrations have to be done (SDoC) we need to step
back and decide on better ways to let extensions augment the search
experience.
Beware that the discussion at this stage may only be relevant to developers
who worked on these extensions. A page describes the current "query
construction" mechanism[1] (with an emphasis on parts that poses problems
at the moment) and a first list of use cases and a first set of
solutions[2].
This is just a starting point for the discussion.
Thanks for your input.
[1] https://www.mediawiki.org/wiki/Extension:CirrusSearch/Query_Construction
[2]
https://www.mediawiki.org/wiki/Extension:CirrusSearch/Query_Construction/Us…
The Search Platform Team
<https://www.mediawiki.org/wiki/Wikimedia_Search_Platform> usually holds
office hours the first Wednesday of each month. Come ask us anything about
Wikimedia search!
We’re particularly interested in:
* Opportunities for collaboration—internally or externally to the Wikimedia
Foundation
* Challenges you have with on-wiki search, in any of the languages we
support
But we're happy to talk about anything search-related. Feel free to add
your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, March 6th, 2018
Time: 16:00-17:00 GMT / 08:00-9:00 PST / 11:00-12:00 EST / 17:00-18:00 CET
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vyc-jvgq-dww
*N.B.:* Google Meet System Requirements
<https://support.google.com/meet/answer/7317473>
Trey Jones
Sr. Software Engineer, Search Platform
Wikimedia Foundation
Hello,
This is the weekly update from the Search Platform team for the week
starting 2019-02-18.
As always, feedback and questions are welcome.
== Discussions ==
=== Search ===
* A new Korean language analyzer has been configured for
Korean-language wikis,[0] however it won't be activated until after we
finish the upgrade to Elasticsearch 6, which is ongoing.
* SDC [Structured Data on Commons] wanted to know if we could add in a
'inlabel search keyword' and after lots of discussion, it was merged
into the new WikibaseCirrusSearch extension that has yet to be merged
into the beta cluster [1]
* Erik and the team worked on how to measure mutation latency across
the newly split elasticsearch clusters and decided that default
timeout was good at 30 seconds [2]
* Mathew and Gehel worked on testing the spicerack elasticsearch
module with quite a few patches that are linked in the ticket [3]
* Gehel worked on getting CI set up for search/glent (maven project)
to be set up with same options that we use for search/extra [4]
* A bug was found where a link-breaking typo is in automatic API
documentation for action=query&prop=cirrusbuilddoc, and Erik fixed it
by correcting the api docs for cirrusbuilddoc [5]
* As we now have different APT components to differentiate the
elasticsearch versions, we need to create a new component for the new
version and Gehel fixed it all up [6]
* David worked on preparing a debian package with search plugins
compatible with elastic 5.6.14 in which Gehel merged [7]
* Davis also did quite a bit of work to fix and add integration tests
for several language analyzers [8]
* Erik worked on updating the ttmserver for elasticsearch 6 and
removed elastic 2.x compatibility [9]
== Did you know? ==
Grammatical gender [10] often confuses speakers of English and other
languages without a similar system. “Why is a bridge feminine in
German (Brücke [11]) and masculine in Spanish and French (puente [12]
& pont [13])?” they ask—though usually without links to Wiktionary.
Grammatical gender is really just a system of noun classes [14] where
there are two or three classes, and most things classified as male or
female end up in different classes. Other languages have noun classes
based on whether or not the nouns are animate, whether they are human
or animal, by shape, and sometimes just arbitrarily groupings;
languages can have nearly two dozen noun classes, like some of the
Niger–Congo languages![15]
Now hold on while we veer off on a brief tangent: diminutives are
words that convey a smaller, lesser, or more intimate sense of their
root form.[16] They are common in American nicknames, often showing up
as a -y or -ie ending (Billy vs. Bill, Peggy vs Peg, Bobbie vs
Roberta). Sometimes diminutives, especially when applied to small cute
things, can become the main or only form of a word. For example,
English baby [17] from babe, or kitty from kit.
Diminutives and grammatical gender collide in German Mädchen [18]
(“girl”) which is historically from Magd (cognate with English “maid”)
plus the diminutive suffix -chen; all diminutives formed with -chen
have neuter gender in German. Over time, Mädchen became the
predominate term for a girl, despite the fact that the word is
grammatically “neuter”.
[0] https://phabricator.wikimedia.org/T206874
[1] https://phabricator.wikimedia.org/T215967
[2] https://phabricator.wikimedia.org/T215969
[3] https://phabricator.wikimedia.org/T207920
[4] https://phabricator.wikimedia.org/T216599
[5] https://phabricator.wikimedia.org/T216256
[6] https://phabricator.wikimedia.org/T216047
[7] https://phabricator.wikimedia.org/T215932
[8] https://phabricator.wikimedia.org/T215594
[9] https://phabricator.wikimedia.org/T192680
[10] https://en.wikipedia.org/wiki/Grammatical_gender
[11] https://en.wiktionary.org/wiki/Br%C3%BCcke#German
[12] https://en.wiktionary.org/wiki/puente#Spanish
[13] https://en.wiktionary.org/wiki/pont#French
[14] https://en.wikipedia.org/wiki/Noun_class
[15] https://en.wikipedia.org/wiki/Noun_class#Niger%E2%80%93Congo_languages
[16] https://en.wikipedia.org/wiki/Diminutive
[17] https://en.wiktionary.org/wiki/baby#Etymology
[18] https://en.wiktionary.org/wiki/M%C3%A4dchen#Etymology
----
Subscribe to receive on-wiki (or opt-in email) notifications of the
Discovery weekly update.
https://www.mediawiki.org/wiki/Newsletter:Discovery_Weekly
The archive of all past updates can be found on MediaWiki.org:
https://www.mediawiki.org/wiki/Discovery/Status_updates
Interested in getting involved? See tasks marked as "Easy" or
"Volunteer needed" in Phabricator.
[1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R
[2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R
Yours,
Chris Koerner (he/him)
Community Relations Specialist
Wikimedia Foundation