Hello,

This is just a quick message to let you know that we made some changes today to the monitoring configuration of many of the Data Platform Engineering servers. This may affect you if you participate in Ops Week for Data Engineering and friends.

By default, all notification alerts from Icinga and Prometheus will now go to data-platform-alerts@wikimedia.org instead of data-engineering-alerts@lists.wikimedia.org

We are working to try to make sure that we can route any alert emails (and IRC pings) to the most appropriate team, principally so that we don't overload the person who is on Ops Week with a lot of messages that would be more appropriately routed to Data Platform SREs.

Any scheduled tasks related to data pipelines and services critical for data processing are still going to be sent to the data-engineering-alerts@lists.wikimedia.org list, so that's Airflow jobs, Refine tasks, Gobblin, Sqoop, Varnishkafka, Eventlogging etc.

We haven't made any changes to the monitoring/notification settings of the Search and Query Services servers (Elasticsearch/WDQS/WCQS etc) nor have we made any changes to the Dumps servers. This mainly affects the analytics systems and the rest of the Data Engineering team's infrastructure.

Please do let us know if you have any queries or concerns about this change, or if anything doesn't look right to you.

You can reach out on Slack at #data-engineering-collab or #data-platform-sre or on IRC at #wikimedia-analytics or #wikimedia-data-platform or to data-platform-engineering@wikimedia.org by email.

Kind regards,
Ben

--
Ben Tullis (he/him)
Senior Site Reliability Engineer
Wikimedia Foundation