TL;DR: * We messed up when replacing the mail server in Toolforge * We didn't notice that we had messed up for nearly 3 weeks * Toolforge servers should be able to send outbound email again now
We have been working to replace some of the Cloud VPS instances in the Toolforge project with new instances running Debian Buster (https://phabricator.wikimedia.org/T275864). One step in this process was to replace the mail server instance that handles all outbound mail.
We setup a new mail server on 2021-03-31, but missed an important configuration step of telling the rest of the instances in the Toolforge project to use the new server when sending outgoing mail. A Toolforge user reported on irc at 2021-04-20T21:11Z that they had not received expected emails from their tool recently. Investigation revealed the broken configuration and work started to correct the problem. Around 2021-04-20T21:52Z we deployed the correct mail relay host configuration. Over the next 30 minutes or so this configuration update rolled out across the Toolforge instances, re-enabling outbound mail sending. Around 2021-04-20T22:20Z we ran commands to instruct all Toolforge instances to "unfreeze" emails which were queued for sending but marked as "frozen" due to the prior invalid configuration.
Emails are now being sent out as expected. We apologize for the interruption in service. We will also be looking into some active monitoring system for outbound email delivery to catch problems similar to this more quickly in the future.
Bryan, on behalf of the Toolforge admin team