---------- Forwarded message ----------
From: Mark Bergsma <mark(a)wikimedia.org>
Date: Thu, Apr 21, 2016 at 11:37 AM
Subject: Re: [Wikitech-l] Data center switch-over moving ahead next
week: please stay available :)
To: Operations Engineers <ops(a)lists.wikimedia.org>, Engineering list
<engineering(a)lists.wikimedia.org>, Wikimedia developers
<wikitech-l(a)lists.wikimedia.org>
Cc: Sherry Snyder <ssnyder(a)wikimedia.org>
We've just completed the switch back, and all services are running from our
main data center eqiad (Ashburn) again.
The process went very smooth this time around. In the past two days leading
up to this, we've been able to either fix or work around the most important
issues we encountered on Tuesday. This meant that we had no real setbacks
or unanticipated delays today, and therefore were able to complete the most
time pressing and user-impacting part (during which MediaWiki is read-only)
in 20 minutes, down from ~45 minutes two days ago.
However, we'll be doing this again in the future, and until then we'll work
on improving and further automating this process to get it down to
hopefully much lower levels of impact and duration.
Please let us know if you see any issues which may be caused by the
switch-over(s).
Thanks much to everyone involved!
Mark
On Thu, Apr 21, 2016 at 3:53 PM, Mark Bergsma <mark(a)wikimedia.org> wrote:
> Hi everyone,
>
> After we've been successfully serving our sites from our backup
> data-center codfw (Dallas) for the past two days, we're now starting our
> switch back to eqiad (Ashburn) as planned[1].
>
> We've already moved cache traffic back to eqiad, and within the next
> minutes, we'll disable editing by going read-only for approximately 30
> minutes - hopefully a bit faster than 2 days ago.
>
> [1] http://blog.wikimedia.org/2016/04/11/wikimedia-failover-test/
>
> On Tue, Apr 19, 2016 at 6:00 PM, Mark Bergsma <mark(a)wikimedia.org> wrote:
>
>> Hi all,
>>
>> Today the data center switch-over commenced as planned, and has just
>> fully completed successfully. We are now serving our sites from codfw
>> (Dallas, Texas) for the next 2 days if all stays well.
>>
>> We switched the wikis to read-only (editing disabled) at 14:02 UTC, and
>> went back read-write at 14:48 UTC - a little longer than planned. While
>> edits were possible then, unfortunately at that time Special:Recent Changes
>> (and related change feeds) were not yet working due to an unexpected
>> configuration problem with our Redis servers until 15:10 UTC, when we found
>> and fixed the issue. The site has stayed up and available for readers
>> throughout the entire migration.
>>
>> Overall the procedure was a success with few problems along the way.
>> However we've also carefully kept track of any issues and delays we
>> encountered for evaluation to improve and speed up the procedure, and
>> reducing impact to our users - some of which will already be implemented
>> for our switch back on Thursday.
>>
>> We're still expecting to find (possibly subtle) issues today, and would
>> like everyone who notices anything to use the following channels to report
>> them:
>>
>> 1. File a Phabricator issue with project #codfw-rollout
>> 2. Report issues on IRC: Freenode channel #wikimedia-tech (if urgent)
>> 3. Send an e-mail to the Operations list: ops(a)lists.wikimedia.org
>>
>> We're not done yet, but thanks to all who have helped so far. :-)
>>
>> Mark
>>
>
> --
> Mark Bergsma <mark(a)wikimedia.org>
> Lead Operations Architect
> Director of Technical Operations
> Wikimedia Foundation
>
--
Mark Bergsma <mark(a)wikimedia.org>
Lead Operations Architect
Director of Technical Operations
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
After today's (2016-04-19) datacenter switchover test [1] a
configuration issue resulted in edits not appearing on
Special:RecentChanges for about 20 minutes (14:48 - 15:10 UTC) [2].
[1] https://blog.wikimedia.org/2016/04/18/wikimedia-server-switch/
[2] https://phabricator.wikimedia.org/T133053
The recent changes entries have now been restored, but since they're
several hours old, they may no longer appear on the main
Special:RecentChanges – and since no one was able to see them earlier,
there are probably some edits that need reverting and articles that need
deleting hidden there.
It would be great if someone could review them. To do it, view a page
like this (substitute "commons.wikimedia.org" with your wiki):
https://commons.wikimedia.org/w/index.php?title=Special:RecentChanges&hidel…
…scroll to the bottom, and look at the changes in the affected period
(14:48 - 15:10 UTC).
Thanks and sorry for the issue!
(FlaggedRevs was not affected, and wikis using it are probably fine.)
(On very active wikis, like English Wikipedia, this might already be
outside of the limit that Special:RecentChanges can display…)
--
Bartosz Dziewoński