Hi all,
From recent discussion with colleagues while deploying the train, it's
come up that there used to be a recurring e-mail along these lines, and it seems like it may be useful to resume on a regular basis.
A high volume of error messages in production logs makes it hard to glance at error logs after a deployment and reason about the deployment's impact.[0] This can be especially problematic at times when our visibility into logs is impaired for unrelated reasons, as it has been in recent weeks.[1]
It's my good fortune to be resuming this tradition during a week in which the logspam situation seems to have improved. I'm aware of two currently tracked issues which are producing a noticeable amount of noise:
* T226751: PHP error "non well formed numeric value encountered" from FormatMetadata->formatCoords[2]
* T233349: A data update callback triggered a fail-safe exception in DataUpdateAdapter[3]
Help in eliminating these errors from production logs is greatly appreciated, and I want to extend my thanks to everyone who has spent time this week fixing similar issues. I'd also like to thank everyone who's been working on getting logstash back into usable shape.
[0]. https://wikitech.wikimedia.org/wiki/Deployments/Holding_the_train#Logspam [1]. https://phabricator.wikimedia.org/T234283 [2]. https://phabricator.wikimedia.org/T226751 [3]. https://phabricator.wikimedia.org/T233349
wikitech-l@lists.wikimedia.org