Unexpected down time for ORES and unscheduled deployment right now - AI

25 Sep 2016

Today ORES in production was sending out unreasonable amount of timeout
errors. Causing icinga to scream and 14% failure rate on average for ORES
review tool jobs. It turned out that ores workers are logging too much
causing the nodes to run out of disk space. [1] I suspect we had similar
issue in our labs nodes.

I made changes for prod and labs and deployed it today. You can find more
details in the phab card

[1]: https://phabricator.wikimedia.org/T146581

Cheers
Best