Heya,
Another week, another update from the Search Platform team. This one is for the week starting 2018-04-09.
As always, feedback and questions welcome.
== Discussions ==
=== Search === * New search code fully deployed & enabled on Wikidata.
==Events and News == * Erik and Trey went to the "OpenSource Connections Haystack Search Relevance Conference" and "Tom Tom Founders Festival Machine Learning Conference", which were back-to-back in Charlottesville, VA. Erik presented on how we use clickstream information to create training data for our learning to rank models at Haystack. [1] Trey wrote up trip notes—with lots of links—on MediaWiki. [2]
== Other Noteworthy Stuff == * Fix for CirrusSearchCheckerJob errors rolled out. [3] * Stas implemented indexing Lexemes & Forms for WikibaseLexeme extension. [4]
== Did you know? == *The English verb "to be" is kind of weird—the infinitive "be" and participles "being, been" start with "b-", while the preterite forms "was, were" start with "w-", and the present forms "am, is, are" start with vowels. The conjugations originally come from three or four different verbs! Why "three or four"? Wiktionary disagrees with itself a bit, listing four on the etymology of "is" [5] and three on the etymology of "be". [6] The conflation goes back at least to Proto-Germanic, [7] so German is similarly weird. [8] Dutch has a greatly simplified paradigm, but still shows some trace of the multiple sources. [9] Other languages, including ASL, Arabic, Bengali, Hawaiian, Hebrew, Indonesian, Japanese, Russian, Turkish, and Ukrainian at least partly avoid this mess by having a zero copula. [10] For search on-wiki, we deal with this problem in part with stemming [11] and stop words. [12]
[0] https://www.wikidata.org/wiki/Wikidata:Project_chat#Improvements_on_the_sear... [1] https://commons.wikimedia.org/wiki/File:From_Clicks_to_Models_The_Wikimedia_... [2] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/April_2018_Conference... [3] https://phabricator.wikimedia.org/T190958 [4] https://phabricator.wikimedia.org/T189745 [5] https://en.wiktionary.org/wiki/is#Etymology_1 [6] https://en.wiktionary.org/wiki/be#Etymology [7] https://en.wikipedia.org/wiki/Proto-Germanic_language [8] https://en.wiktionary.org/wiki/sein#Conjugation [9] https://en.wiktionary.org/wiki/zijn#Inflection [10] https://en.wikipedia.org/wiki/Zero_copula [11] https://en.wikipedia.org/wiki/Stemming [12] https://en.wikipedia.org/wiki/Stop_words
---
Subscribe to receive on-wiki (or opt-in email) notifications of the Discovery weekly update.
https://www.mediawiki.org/wiki/Newsletter:Discovery_Weekly
The archive of all past updates can be found on MediaWiki.org:
https://www.mediawiki.org/wiki/Discovery/Status_updates
Interested in getting involved? See tasks marked as "Easy" or "Volunteer needed" in Phabricator.
[1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R [2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R
Yours, Chris Koerner Community Liaison Wikimedia Foundation
wikitech-l@lists.wikimedia.org