[Labs-announce] Gridengine master outage of 2015-06-02
Marc-André Pelletier
mpelletier at wikimedia.org
Thu Jun 4 23:11:34 UTC 2015
Hello Labs,
It has been pointed out to me that I never wrote an email pointing to
the incident report for the partial Tool Labs outage mentionned in Subject:
https://wikitech.wikimedia.org/wiki/Incident_documentation/20150602-gridengine-dns-failure
tl;dr: Two distinct name resolution issues caused by side effects of the
DNS changes in labs caused intermittent issues for the gridengine
master, causing issues with scheduling of new jobs. Both issues have
been tracked down and fixed.
-- Marc
More information about the Labs-announce
mailing list