On Feb 4, 2019, at 10:48 AM, Giovanni Tirloni <gtirloni@wikimedia.org> wrote:

Re: relying on aggregating many alert emails to indicate something, it requires a human judgment (look at email folder, read emails, count how many, make a judgment if that looks bad, etc). I'd rather look at a Grafana dashboard (since Prometheus still is collecting iowait) _when_ there's a real problem, we just need to define what a real problem looks like.

Definitely!  We used to determine real problems via load numbers on the NFS servers…you know how that ended up. :)

Overall, while I’ve fixed my email filter, I think I’m with you on this.  Let’s nix those alerts.