Hi,
Here's the weekly update from the Search Platform team.
As always, feedback and questions welcome.
== Discussions ==
=== Search ===
* Trey completed a technical review of the available Estonian
morphological library with help from Guillaume and David, and
unfortunately it's not usable, and the stemming algorithm is not
easily ported. See T178928. [0]
* Trey did an analysis [1] of the effect of using the Elasticsearch
Indonesian analysis chain on Malay-language data. (See Wikipedia [2]
for details on Malay and Indonesian.) Next step is getting speaker
review of the stemming quality, then hopefully on to reindexing wikis
in both Malay and Indonesian.
* Trey did a write up about the weirdness that comes from searching
for single punctuation characters without good redirect support [3] to
explain why searching for a hyphen on Farsi Wikipedia redirects you to
the article on the apostrophe. See also T196826. [4]
* Erik and David looked at adding 'type' field to store same
information as was in es5 types in metastore [5]
* David did work on investigating (and implementing) how the prefix
keyword should augment and not override the list of requested
namespaces [6]
* Trey got the feedback he needed to go head and create and merge
Croatian, Serbo-Croatian, and Bosnian Analysis Chains Using Serbian
Morphological Libraries [7]
* Gehel found that when we freeze writes to elasticsearch, jobs pile
up in the job queue and we needed an alert to tell us that the writes
aren't getting thawed in a timely manner [8]
* Trey worked on moving Serbian language wikis from extra-analysis to
extra-analysis-serbian plugin (it went into production a week ago with
the re-indexing) [9]
* Erik and Gehel resolved current deprecation warnings in elasticsearch 5 [10]
* David worked on adding support for boosting keywords [11] and adding
support for Filtering keyword (FilterQueryFeature) [12]
* Erik did quite a bit of research on how to ensure that the regex
highlighting doesn't always timeout as expected because @ apparently
matches "any string" in the lucene regex syntax; Trey helped with the
analysis and it got pushed into production in early June [13]
* Stas added lemma & form representation texts to fulltext search
index, which allows (very primitive) fulltext search for Lexemes [14].
Better search coming soon!
== Other Noteworthy Stuff ==
* Wikidata Quality Constraints violation now can be exported into
RDF.[15] Loading to Wikidata Query Service coming soon. [16]
[0]
https://phabricator.wikimedia.org/T178928#4267448
[1]
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Analysis_of_Applying…
[2]
https://en.wikipedia.org/wiki/Comparison_of_Standard_Malay_and_Indonesian
[3]
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Searching_for_Punctu…
[4]
https://phabricator.wikimedia.org/T196826
[5]
https://phabricator.wikimedia.org/T192615
[6]
https://phabricator.wikimedia.org/T195815
[7]
https://phabricator.wikimedia.org/T192395
[8]
https://phabricator.wikimedia.org/T193605
[9]
https://phabricator.wikimedia.org/T193734
[10]
https://phabricator.wikimedia.org/T192614
[11]
https://phabricator.wikimedia.org/T195305
[12]
https://phabricator.wikimedia.org/T195788
[13]
https://phabricator.wikimedia.org/T195491
[14]
https://phabricator.wikimedia.org/T195912
[15]
https://www.wikidata.org/wiki/Q42?action=constraintsrdf
[16]
https://phabricator.wikimedia.org/T172380
---
Subscribe to receive on-wiki (or opt-in email) notifications of the
Discovery weekly update.
https://www.mediawiki.org/wiki/Newsletter:Discovery_Weekly
The archive of all past updates can be found on
MediaWiki.org:
https://www.mediawiki.org/wiki/Discovery/Status_updates
Interested in getting involved? See tasks marked as "Easy" or
"Volunteer needed" in Phabricator.
[1]
https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R
[2]
https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R
Yours,
Chris Koerner
Community Liaison
Wikimedia Foundation