[Labs-l] [Labs-announce] Possible reboots and/or outages -- please read
Andrew Bogott
abogott at wikimedia.org
Fri May 20 16:03:37 UTC 2016
On 5/20/16 10:45 AM, Maximilian Doerr wrote:
> Does this unusual behavior, also cause erroneous fingerprints being given during authentication. Yesterday when I was SSHing into Cyberbot-exec-01 via bastion I got a fingerprint mismatch with the exec node. This was using WinSCP.
That's almost certainly not related. I'm not sure why that would
happen, although there are a few possible failure cases in our network
setup where traffic gets routed not to your VM but directly to the
nova-network host. In that case you'd be seeing the wrong key because
it's the key for labnet1002 instead of for your VM. The timing doesn't
hold up but there was a brief network outage a couple of days ago that
could have caused that.
Do let me know if you see this issue repeatedly -- in particular, if you
can open a phabricator ticket which includes both host keys (the
erroneous one and the correct one) then that will help us diagnose things.
-Andrew
>
> To clarify, it got into Bastion just fine but not into my node, because I aborted the authentication process. I waited 5 minutes and tried again and it worked just fine.
>
> Cyberpower678
> English Wikipedia Account Creation Team
> ACC Mailing List Moderator
> Global User Renamer
>
>> On May 20, 2016, at 11:10, Andrew Bogott <abogott at wikimedia.org> wrote:
>>
>> Note: Tools users can ignore this message
>>
>> We are seeing some unusual behavior on labvirt1003, which hosts a large number of labs instances. The problem is not yet diagnosed, but it is likely a hardware problem that will require reboots or downtime. Here is a complete list of labs instances currently living on labvirt1003:
>>
>> https://phabricator.wikimedia.org/P3159
>>
>> If you have any hosts on that box that cannot survive a reboot, please either let me know, or take steps to minimize the damage. I've removed labvirt1003 from the scheduler, so if you want to build a new instance and migrate services to it you can be assured that the new instance will be isolated from the coming chaos.
>>
>> A simple reboot shouldn't produce more than 5-10 minutes of downtime. If a major outage seems likely, I'll follow up with additional warning.
>>
>> -Andrew
>>
>>
>> _______________________________________________
>> Labs-announce mailing list
>> Labs-announce at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-announce
>> _______________________________________________
>> Labs-l mailing list
>> Labs-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
More information about the Labs-l
mailing list