On Feb 4, 2019, at 10:48 AM, Giovanni Tirloni gtirloni@wikimedia.org wrote:
Re: relying on aggregating many alert emails to indicate something, it requires a human judgment (look at email folder, read emails, count how many, make a judgment if that looks bad, etc). I'd rather look at a Grafana dashboard (since Prometheus still is collecting iowait) _when_ there's a real problem, we just need to define what a real problem looks like.
Definitely! We used to determine real problems via load numbers on the NFS servers…you know how that ended up. :)
Overall, while I’ve fixed my email filter, I think I’m with you on this. Let’s nix those alerts.