Hi Ambassadors!
I'm writing to send another CirrusSearch update. This one isn't good
news. We got a bit over aggressive about pushing Cirrus as the primary
search backend for bigger wikis and pushed ourselves over the edge but in
slow motion. Things started breaking down during Europe's peak time on
Tuesday. I wrestled with the production system all day trying get an
accurate fix on exactly how we were failing and to stem the tide. I
thought I had it by the end of my day on Tuesday. On my Wednesday morning
(Europe's afternoon) I woke to see us slipping again. So I rolled back
"set cirrus as primary" deploys. See here for wikis that don't have cirrus
as primary:
https://www.mediawiki.org/wiki/Search#Wikis
Now that we're stable again I've started working on the problem at the root:
1. We're getting more servers. We're about doubling the cluster size.
2. I'm putting together more optimizations to the portion of Cirrus that
fell over (working set <https://en.wikipedia.org/wiki/Working_set>). If
everything goes as planned we'll reduce it by about 80%. They swap
indexing performance for search performance.
I'm on vacation next week so I'm going to start rolling those optimization
out on Monday July 28th. They won't go everywhere immediately, but I'll
roll them in as see how the index time performance hit effects us.
Also: these changes will change result relevance some. In my local testing
it looked like everything still worked: title still beats redirect still
beats category still beat heading still beats article lead in still beats
text still beats image captions and file contents. BUT I still expect
things to shift around a bit. Please let me know if you see anything fishy.
Nik
PS. Sorry for the enwiki link.