The Search Platform Team
<https://www.mediawiki.org/wiki/Wikimedia_Search_Platform> usually holds
office hours the first Wednesday of each month. Come talk to us about
anything related to Wikimedia search!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, Oct 2nd, 2019
Time: 15:00-16:00 GMT / 08:00-9:00 PDT / 11:00-12:00 EDT / 17:00-18:00 CEST
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vyc-jvgq-dww
Hope to talk to you in a week!
Trey Jones
Sr. Software Engineer, Search Platform
Wikimedia Foundation
UTC-4 / EDT
Hello,
This is the weekly update from the Search Platform team for the week
starting 2019-08-26 until 2019-09-30.
This is the final weekly update from the team. Since starting in March
of 2016 we have published over 135 issues of this newsletter. Thank
you for reading.
Work continues however, and interested community members can follow
the progress of the Search Platform team through the Scrum of Scrums
weekly notes. [0]
== Discussions ==
=== Search ===
* There was an older bug where unpredictable behavior with the order
of Special:Search parameters was occurring - we had worked on it
previously but David added a new patch to add morelikethis a
non-greedy version of the morelike keyword and deployed it this week
on the train [1]
* David and Tgr did some work on fixing where vagrant wikibase cirrus
role was not working and had updated Cirrus to index P1 and P2 as
statements [2]
* Cloudelastic jvms were suffering from weird behaviors of the GC
causing slowdowns of the whole cluster and therefor slowing
consumption of production MW JobQueues; it needed some alerts that
Mathew and Gehel added in [3]
* David discovered that create_timestamp was not present on production
index mappings for some wikis and fixed it [4]
* Several folks worked on an issue where the elasticsearch systemd
unit sets PrivateTmp=true, but it preventing jstack / jmap / etc...
from connecting to the JVM [5]
* There was a review of the logs and discovered that Elasticsearch OOM
errors in MW vagrant....fixed by increasing Xmx to 512m [6]
* Tgr found a bug where CirrusSearch on Vagrant throws
"mapper_parsing_exception: analyzer [aa_plain] not found for field
[plain]" on provision and David fixed it by adding a patch to always
enable WBCS [7]
* We needed to normalize deepcat inputs, as it was found that deepcat
was case sensitive on first letter of category name [8]
* Icinga reports read time out error for some checks on cloudelastic
cluster, so with some team conversation, we added the option separator
for elastic shard size alerts [9]
* David found an issue where EventBusMonologHandler was malforming
UTF-8 characters, because they were possibly incorrectly encoded,
resulting in send aborted (and now fixed by normalizing the request
param name) [10]
* The team did several patches to adjust mjolnir bulk_daemon to import
glent swift uploads as desired [11]
* We found many memory correctable errors -EDAC- elastic1029 that
needed reviewing...the original issue seems to have gone away, but
will need more help / work from SRE to get the server working properly
(new ticket will be created) [12]
* Stas and Igor worked on an error where
ConcurrentModificationException is on a non-grouping query with
aggregates in SELECT [
* There was a request to update Blazegraph where a normalized
exception was happening with a particular query; Stas and Igor
collaborated on it, adding support uncertainVars in ServiceNode and
fixing NME on bind variable both by LabelService and some other clause
[13] [14] [15]
* There was also a query that found HAVING in named subquery results
in “non-aggregate variable in select expression” error, Igor and Stas
did more collaboration to fix it [16]
* More Blazegraph fixes: SELECT * on query with no variables and
property path results in NotMaterializedException [17] and
UnsupportedOperationException on property path in EXISTS [18]
* A bug was discovered in the search results page where the Commons
images weren't showing up anymore (on all wiki's other than enwiki);
David found the issue and fixed it [19] [20]
* The Discernatron tool for labeling Wikipedia search results for
relevance testing used to be available but started getting a '502'
error, Erik restarted the container and it's working again [21]
* David worked on making sure search engines can control extract
interfaces and base classes from SearchResultSet and SearchResult [22]
* As part of our support for the Structured Data on Commons
work...hascaption (including hascaption:*) currently returns all files
that ever had a caption, even if that caption has been removed via
reversion or edit and this needs to be changed so that when the
indexing occurs (and data is removed), the
hascaption/inlabel/incaption reflects those changes [23]
* David worked on adding a debugging API to dump the explanation of
the completion suggester scores [24]
* David also added support for OR in the hastemplate keyword using | (pipe) [25]
* The team worked on (and finished) migrating WDQS to new logging pipeline [26]
* A bug was filed where subpageof will sometimes display results which
are not subpages of the page that we limited the search to (it should
indicate that is matched against a redirect) [27]
[0] https://www.mediawiki.org/wiki/Scrum_of_scrums
[1] https://phabricator.wikimedia.org/T159321
[2] https://phabricator.wikimedia.org/T228503
[3] https://phabricator.wikimedia.org/T231516
[4] https://phabricator.wikimedia.org/T230990
[5] https://phabricator.wikimedia.org/T230774
[6] https://phabricator.wikimedia.org/T211362
[7] https://phabricator.wikimedia.org/T230018
[8] https://phabricator.wikimedia.org/T228633
[9] https://phabricator.wikimedia.org/T230366
[10] https://phabricator.wikimedia.org/T228496
[11] https://phabricator.wikimedia.org/T227364
[12] https://phabricator.wikimedia.org/T214283
[13] https://phabricator.wikimedia.org/T159723
[14] https://phabricator.wikimedia.org/T170704
[15] https://phabricator.wikimedia.org/T168876
[16] https://phabricator.wikimedia.org/T165559
[17] https://phabricator.wikimedia.org/T168741
[18] https://phabricator.wikimedia.org/T173243
[19] https://phabricator.wikimedia.org/T232032
[20] https://www.mediawiki.org/wiki/Topic:V6dtxvwtk9nchcbx
[21] https://phabricator.wikimedia.org/T231980
[22] https://phabricator.wikimedia.org/T228626
[23] https://phabricator.wikimedia.org/T231038
[24] https://phabricator.wikimedia.org/T230919
[25] https://phabricator.wikimedia.org/T232078
[26] https://phabricator.wikimedia.org/T232184
[27] https://phabricator.wikimedia.org/T187548
----
The archive of all past updates can be found on MediaWiki.org:
https://www.mediawiki.org/wiki/Discovery/Status_updates
Interested in getting involved? See tasks marked as "Easy" or
"Volunteer needed" in Phabricator.
[1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R
[2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R
Yours,
Chris Koerner (he/him)
Community Relations Specialist
Wikimedia Foundation
The Search Platform Team
<https://www.mediawiki.org/wiki/Wikimedia_Search_Platform> usually holds
office hours the first Wednesday of each month. Come talk to us about
anything related to Wikimedia search!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, Sep 4th, 2019
Time: 15:00-16:00 GMT / 08:00-9:00 PDT / 11:00-12:00 EDT / 17:00-18:00 CEST
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vyc-jvgq-dww
Hope to talk to you in a week!
Trey Jones
Sr. Software Engineer, Search Platform
Wikimedia Foundation
UTC-4 / EDT