Hi Ambassadors!
I'm writing to send another CirrusSearch update. This one isn't good news. We got a bit over aggressive about pushing Cirrus as the primary search backend for bigger wikis and pushed ourselves over the edge but in slow motion. Things started breaking down during Europe's peak time on Tuesday. I wrestled with the production system all day trying get an accurate fix on exactly how we were failing and to stem the tide. I thought I had it by the end of my day on Tuesday. On my Wednesday morning (Europe's afternoon) I woke to see us slipping again. So I rolled back "set cirrus as primary" deploys. See here for wikis that don't have cirrus as primary: https://www.mediawiki.org/wiki/Search#Wikis
Now that we're stable again I've started working on the problem at the root: 1. We're getting more servers. We're about doubling the cluster size. 2. I'm putting together more optimizations to the portion of Cirrus that fell over (working set https://en.wikipedia.org/wiki/Working_set). If everything goes as planned we'll reduce it by about 80%. They swap indexing performance for search performance.
I'm on vacation next week so I'm going to start rolling those optimization out on Monday July 28th. They won't go everywhere immediately, but I'll roll them in as see how the index time performance hit effects us.
Also: these changes will change result relevance some. In my local testing it looked like everything still worked: title still beats redirect still beats category still beat heading still beats article lead in still beats text still beats image captions and file contents. BUT I still expect things to shift around a bit. Please let me know if you see anything fishy.
Nik
PS. Sorry for the enwiki link.
wikitech-ambassadors@lists.wikimedia.org