Hi Rogol,
I'm not sure what you're referring to when you're talking about a "recovery plan". These are all pretty standard day-to-day operations for us.
This was a small outage, affecting only a small portion of European traffic (and tiny portion of our global traffic) and not what we'd call a "widespread" one -- very far from it. It wasn't even noticeable in traffic graphs at all. This was the result of a networking issue in an intermediate provider between us and the ISPs affected, so nothing that was in our direct control.
This could have been remediated by either a) the intermediate provider fixing or working around the issue b) us disabling the use of that provider temporarily c) the affected ISPs (like Fastweb) disabling the use of that provider temporarily. In this case, we did not wait for (a) or (c) and worked around the issue ourselves, faster than Fastweb etc. did (and in less than an hour even since the initial report). I think the response from our side and our engineers (and especially Arzhel) was pretty stellar.
Faidon
On Mon, May 15, 2017 at 09:00:06PM +0100, Rogol Domedonfors wrote:
Faidon
Do you believe that your recovery plan was adequate or will you be reviewing it in the light of this widespread outage?
"Rogol"
On Mon, May 15, 2017 at 12:02 PM, Faidon Liambotis faidon@wikimedia.org wrote:
Hi Cristian,
We've had some network connectivity issues with one of our ISPs in our European datacenter that were probably the source of your problems (FastWeb in Italy were among the affected ISPs). This was a localized issue -- only in the European datacenter and one out of six major network carriers and dozens of network connectivity partners.
The issue has been worked around as of 09:26 UTC and the situation is being monitored.
This issue is being tracked at: https://phabricator.wikimedia.org/T165288
Do let us know if this is not the case for you of if you're experiencing any trouble, here, off-list or directly on that Phabricator task.
Thanks, Faidon -- Faidon Liambotis Principal Operations Engineer Wikimedia Foundation
On Mon, May 15, 2017 at 11:02:39AM +0200, Cristian Consonni wrote:
Hi,
At Wikimedia Italia have been contacted in the last few minutes because Wikipedia seems unreachable and very slow from Italy.
The status pages for servers and services signals several perfomance
issues:
Does somebody know what's going on?
Thank you.
Cristian
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe