On Feb 4, 2019, at 10:48 AM, Giovanni Tirloni
<gtirloni(a)wikimedia.org> wrote:
Re: relying on aggregating many alert emails to indicate something, it requires a human
judgment (look at email folder, read emails, count how many, make a judgment if that looks
bad, etc). I'd rather look at a Grafana dashboard (since Prometheus still is
collecting iowait) _when_ there's a real problem, we just need to define what a real
problem looks like.
Definitely! We used to determine real problems via load numbers on the NFS servers…you
know how that ended up. :)
Overall, while I’ve fixed my email filter, I think I’m with you on this. Let’s nix those
alerts.