Hi all,
From recent discussion with colleagues while deploying
the train, it's
come up that there used to be a recurring e-mail along these
lines, and
it seems like it may be useful to resume on a regular basis.
A high volume of error messages in production logs makes it hard to
glance at error logs after a deployment and reason about the
deployment's impact.[0] This can be especially problematic at times
when our visibility into logs is impaired for unrelated reasons, as it
has been in recent weeks.[1]
It's my good fortune to be resuming this tradition during a week in
which the logspam situation seems to have improved. I'm aware of two
currently tracked issues which are producing a noticeable amount of noise:
* T226751: PHP error "non well formed numeric value encountered" from
FormatMetadata->formatCoords[2]
* T233349: A data update callback triggered a fail-safe exception in
DataUpdateAdapter[3]
Help in eliminating these errors from production logs is greatly
appreciated, and I want to extend my thanks to everyone who has spent
time this week fixing similar issues. I'd also like to thank everyone
who's been working on getting logstash back into usable shape.
[0].
https://wikitech.wikimedia.org/wiki/Deployments/Holding_the_train#Logspam
[1].
https://phabricator.wikimedia.org/T234283
[2].
https://phabricator.wikimedia.org/T226751
[3].
https://phabricator.wikimedia.org/T233349
--
Brennen Bearnes (he/him)
Release Engineering
Wikimedia Foundation