Today ORES in production was sending out unreasonable amount of timeout errors. Causing icinga to scream and 14% failure rate on average for ORES review tool jobs. It turned out that ores workers are logging too much causing the nodes to run out of disk space. [1] I suspect we had similar issue in our labs nodes.
I made changes for prod and labs and deployed it today. You can find more details in the phab card
[1]: https://phabricator.wikimedia.org/T146581
Cheers Best