[Labs-l] Tool Labs SGE outage

Yuvi Panda yuvipanda at gmail.com
Wed May 27 21:06:40 UTC 2015


It's been back and working mostly well for a while now. According to
alerts the partial outage was from 18:33 UTC to 20:17 UTC. More
details to follow later, here and at
https://phabricator.wikimedia.org/T100554

On Wed, May 27, 2015 at 10:32 PM, Yuvi Panda <yuvipanda at gmail.com> wrote:
> Note that this does not actually affect anything currently running -
> only new job submission and any other direct interaction with
> gridengine (jsub, qstat, etc)
>
> On Wed, May 27, 2015 at 10:21 PM, Yuvi Panda <yuvipanda at gmail.com> wrote:
>> It's intermittently working now, we're still working on it.
>>
>> On Wed, May 27, 2015 at 8:54 PM, Merlijn van Deen <valhallasw at arctus.nl> wrote:
>>> Hello to all Tool Labs users,
>>>
>>> Because of a security issue, the SGE master (i.e., the service that sends
>>> jobs to different nodes) had to be restarted, and currently it is not
>>> starting back up. We are currently working on the issue, and hope to have
>>> the backup master taking over service shortly.
>>>
>>> Our apologies for the service disruption; we had tested this change and
>>> expected a clean rollout.
>>>
>>> Merlijn
>>>
>>> _______________________________________________
>>> Labs-l mailing list
>>> Labs-l at lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>
>>
>>
>>
>> --
>> Yuvi Panda T
>> http://yuvi.in/blog
>
>
>
> --
> Yuvi Panda T
> http://yuvi.in/blog



-- 
Yuvi Panda T
http://yuvi.in/blog



More information about the Labs-l mailing list