Hello, Here are this past week's updates from the Discovery department.
== Highlights == * Finalized the second BM25 testing analysis and linked to the pdf here. [0]
==Search == * Migrated Phan for CirrusSearch to Jenkins. (technical debt) [1] [2] * Finished writing up, summarizing, and recommending extensive changes to TextCat for language identification. [3] Overall improvement to F0.5 accuracy was a mean of just under 5% across the corpora from nine Wikipedias. The two worst performing corpora, from enwiki and nlwiki, each went up around 10%! All nine are now above 90% F0.5 score. Next step is to deploy the recommended changes. [4] * Completed (a round of) refactoring and cleanup of Special:Search code [5] [6]
[0] https://www.mediawiki.org/wiki/Discovery_Analysis#Past_analyses [1] https://www.mediawiki.org/wiki/Continuous_integration/Phan [2] https://phabricator.wikimedia.org/T153040 [3] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/TextCat_Improvements#... [4] https://en.wikipedia.org/wiki/F1_score [5] https://phabricator.wikimedia.org/T150217 [6] https://phabricator.wikimedia.org/T150390
----
The archive of all past updates can be found on MediaWiki.org:
https://www.mediawiki.org/wiki/Discovery/Status_updates
Interested in getting involved? See tasks marked as "Easy" or "Volunteer needed" in Phabricator.
[1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R [2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R
Yours, Chris Koerner Community Liaison - Discovery Wikimedia Foundation
wikitech-l@lists.wikimedia.org