Deployment-prep (aka 'Beta') services will be unreliable this week

List overview All Threads
Download

newer

older

Half-time achievements in Google...

The annual Community Wishlist...

Andrew Bogott

20 Nov 2018 20 Nov '18

2:10 a.m.

Due to a lucky arrangement of schedules, we are going to move the deployment-prep project to the new Cloud region this week, starting in a few minutes. *This includes all of the infrastructure behind the beta.wmflabs.org site.*

Various people will be standing by to troubleshoot the outages that result, but for the most part if you see bad behaviors you should disregard or work around them for the time being.

Once everything is moved and semi-stable I will send a followup email, at which point the deployment-prep team will once again become interested in bug reports :)

Sorry for the short notice! With a little luck this should be mostly painless.

-Andrew

Show replies by date

Mukunda Modell

21 Nov 21 Nov

7:40 a.m.

New subject: Deployment-prep (aka 'Beta') services will be unreliable this week

So we've made some headway in resolving all of the inevitable issues that cropped up after migrating a lot of cloud instances to all new infrastructure and assigning a new IP to each of them. Many Thanks to Krenair and Andrew for all the great work you've been doing to get things back to normal.

Despite each of our efforts, there is still a configuration error preventing MediaWiki from working in beta. I've been stuck for several hours now but I'm hopeful that this is the last major issue and I'm sure it's simple for someone who understands MediaWiki internals better than I do. Unfortunately I'm completely stumped so I could really use some help from someone who understands the configuration of MediaWiki session storage and the underlying object cache.

The problem is described in https://phabricator.wikimedia.org/T210030 so I won't repeat it here. I'll simply appeal for those of you who know something about how "BagOStuff" is configured, please take a look at T210030 and point me in the right direction.

Thanks in advance for taking a look. I apologize for the inconvenience caused by this unfortunately unavoidable interruption of service.

On Mon, Nov 19, 2018 at 12:11 PM Andrew Bogott abogott@wikimedia.org wrote:

...

Due to a lucky arrangement of schedules, we are going to move the deployment-prep project to the new Cloud region this week, starting in a few minutes. *This includes all of the infrastructure behind the beta.wmflabs.org site.*

Various people will be standing by to troubleshoot the outages that result, but for the most part if you see bad behaviors you should disregard or work around them for the time being.

Once everything is moved and semi-stable I will send a followup email, at which point the deployment-prep team will once again become interested in bug reports :)

Sorry for the short notice! With a little luck this should be mostly painless.

-Andrew

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Bryan Davis

11:49 a.m.

New subject: Deployment-prep (aka 'Beta') services will be unreliable this week

On Tue, Nov 20, 2018 at 4:41 PM Mukunda Modell mmodell@wikimedia.org wrote:

...

Despite each of our efforts, there is still a configuration error preventing MediaWiki from working in beta. I've been stuck for several hours now but I'm hopeful that this is the last major issue and I'm sure it's simple for someone who understands MediaWiki internals better than I do. Unfortunately I'm completely stumped so I could really use some help from someone who understands the configuration of MediaWiki session storage and the underlying object cache.

The problem is described in https://phabricator.wikimedia.org/T210030 so I won't repeat it here. I'll simply appeal for those of you who know something about how "BagOStuff" is configured, please take a look at T210030 and point me in the right direction.

The beta cluster wikis are working again. It turns out that there was some confusion when moving/removing servers because of implementation drift between our production clusters and the beta cluster.

Before the move to the new region we had both "memc*" and "redis*" servers in the beta cluster project. The "memc*" servers are the equivalent of our production "mc*" servers. In production the "mc*" servers run both memcached and redis services. In the beta cluster our "memc*" servers were only providing memcached and the configuration relied on the "redis*" servers for session storage. The "redis*" servers were removed while migrating virtual machines to the eqiad1-r region under the assumption that they were legacy servers from the time when we used redis as storage for the job queue. The fix was to setup the "memc*" servers with both memcached for arbitrary data caching and redis for session storage. If you are interested in the gory details see notes left on https://phabricator.wikimedia.org/T210030.

Bryan

-- Bryan Davis Wikimedia Foundation bd808@wikimedia.org [[m:User:BDavis_(WMF)]] Manager, Technical Engagement Boise, ID USA irc: bd808 v:415.839.6885 x6855

Alex Monk

24 Nov 24 Nov

1:15 a.m.

New subject: Deployment-prep (aka 'Beta') services will be unreliable this week

For the record this is largely complete with a few known remaining problems: * MediaWiki does not recognise the new cache instance due to missing config change, the patch is ready but requires a deployer, this does cause problems for things like account creations: https://phabricator.wikimedia.org/T210296 * Restbase service is offline, causing VE to not work: https://phabricator.wikimedia.org/T208101#4768886 * Zotero instance is stuck in the old region as the instance is so old the base image has gone missing (Ubuntu Trusty). This instance was already on its way out though. * Old cache instance is still receiving traffic despite having no DNS records pointing at it: https://phabricator.wikimedia.org/T210214

On Mon, 19 Nov 2018 at 18:11, Andrew Bogott abogott@wikimedia.org wrote:

...

Due to a lucky arrangement of schedules, we are going to move the deployment-prep project to the new Cloud region this week, starting in a few minutes. *This includes all of the infrastructure behind the beta.wmflabs.org site.*

Various people will be standing by to troubleshoot the outages that result, but for the most part if you see bad behaviors you should disregard or work around them for the time being.

Once everything is moved and semi-stable I will send a followup email, at which point the deployment-prep team will once again become interested in bug reports :)

Sorry for the short notice! With a little luck this should be mostly painless.

-Andrew

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Alex Monk

1:46 a.m.

New subject: Deployment-prep (aka 'Beta') services will be unreliable this week

Also: * Puppet is disabled on deployment-mediawiki-07 for unknown reasons https://phabricator.wikimedia.org/T208101#4770736 * Confusion about the status of deployment-redis*, they seem completely broken but have important references? https://phabricator.wikimedia.org/T210301

On Fri, 23 Nov 2018 at 17:15, Alex Monk krenair@gmail.com wrote:

...

For the record this is largely complete with a few known remaining problems:

MediaWiki does not recognise the new cache instance due to missing

config change, the patch is ready but requires a deployer, this does cause problems for things like account creations: https://phabricator.wikimedia.org/T210296

Restbase service is offline, causing VE to not work:

https://phabricator.wikimedia.org/T208101#4768886

Zotero instance is stuck in the old region as the instance is so old the

base image has gone missing (Ubuntu Trusty). This instance was already on its way out though.

Old cache instance is still receiving traffic despite having no DNS

records pointing at it: https://phabricator.wikimedia.org/T210214

On Mon, 19 Nov 2018 at 18:11, Andrew Bogott abogott@wikimedia.org wrote:

...
Due to a lucky arrangement of schedules, we are going to move the deployment-prep project to the new Cloud region this week, starting in a few minutes. *This includes all of the infrastructure behind the beta.wmflabs.org site.*

Various people will be standing by to troubleshoot the outages that result, but for the most part if you see bad behaviors you should disregard or work around them for the time being.

Once everything is moved and semi-stable I will send a followup email, at which point the deployment-prep team will once again become interested in bug reports :)

Sorry for the short notice! With a little luck this should be mostly painless.

-Andrew

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

2201

Age (days ago)

2205

Last active (days ago)

wikitech-l@lists.wikimedia.org

4 comments

4 participants

tags (0)

participants (4)

Alex Monk
Andrew Bogott
Bryan Davis
Mukunda Modell