[Labs-announce] Gridengine master outage of 2015-06-02

Marc-André Pelletier mpelletier at wikimedia.org
Thu Jun 4 23:11:34 UTC 2015


Hello Labs,

It has been pointed out to me that I never wrote an email pointing to
the incident report for the partial Tool Labs outage mentionned in Subject:

https://wikitech.wikimedia.org/wiki/Incident_documentation/20150602-gridengine-dns-failure

tl;dr: Two distinct name resolution issues caused by side effects of the
DNS changes in labs caused intermittent issues for the gridengine
master, causing issues with scheduling of new jobs.  Both issues have
been tracked down and fixed.

-- Marc



More information about the Labs-announce mailing list