Due to a lucky arrangement of schedules, we are going to move the deployment-prep project to the new Cloud region this week, starting in a few minutes. *This includes all of the infrastructure behind the beta.wmflabs.org site.*
Various people will be standing by to troubleshoot the outages that result, but for the most part if you see bad behaviors you should disregard or work around them for the time being.
Once everything is moved and semi-stable I will send a followup email, at which point the deployment-prep team will once again become interested in bug reports :)
Sorry for the short notice! With a little luck this should be mostly painless.
-Andrew
So we've made some headway in resolving all of the inevitable issues that cropped up after migrating a lot of cloud instances to all new infrastructure and assigning a new IP to each of them. Many Thanks to Krenair and Andrew for all the great work you've been doing to get things back to normal.
Despite each of our efforts, there is still a configuration error preventing MediaWiki from working in beta. I've been stuck for several hours now but I'm hopeful that this is the last major issue and I'm sure it's simple for someone who understands MediaWiki internals better than I do. Unfortunately I'm completely stumped so I could really use some help from someone who understands the configuration of MediaWiki session storage and the underlying object cache.
The problem is described in https://phabricator.wikimedia.org/T210030 so I won't repeat it here. I'll simply appeal for those of you who know something about how "BagOStuff" is configured, please take a look at T210030 and point me in the right direction.
Thanks in advance for taking a look. I apologize for the inconvenience caused by this unfortunately unavoidable interruption of service.
On Mon, Nov 19, 2018 at 12:11 PM Andrew Bogott abogott@wikimedia.org wrote:
Due to a lucky arrangement of schedules, we are going to move the deployment-prep project to the new Cloud region this week, starting in a few minutes. *This includes all of the infrastructure behind the beta.wmflabs.org site.*
Various people will be standing by to troubleshoot the outages that result, but for the most part if you see bad behaviors you should disregard or work around them for the time being.
Once everything is moved and semi-stable I will send a followup email, at which point the deployment-prep team will once again become interested in bug reports :)
Sorry for the short notice! With a little luck this should be mostly painless.
-Andrew
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Tue, Nov 20, 2018 at 4:41 PM Mukunda Modell mmodell@wikimedia.org wrote:
Despite each of our efforts, there is still a configuration error preventing MediaWiki from working in beta. I've been stuck for several hours now but I'm hopeful that this is the last major issue and I'm sure it's simple for someone who understands MediaWiki internals better than I do. Unfortunately I'm completely stumped so I could really use some help from someone who understands the configuration of MediaWiki session storage and the underlying object cache.
The problem is described in https://phabricator.wikimedia.org/T210030 so I won't repeat it here. I'll simply appeal for those of you who know something about how "BagOStuff" is configured, please take a look at T210030 and point me in the right direction.
The beta cluster wikis are working again. It turns out that there was some confusion when moving/removing servers because of implementation drift between our production clusters and the beta cluster.
Before the move to the new region we had both "memc*" and "redis*" servers in the beta cluster project. The "memc*" servers are the equivalent of our production "mc*" servers. In production the "mc*" servers run both memcached and redis services. In the beta cluster our "memc*" servers were only providing memcached and the configuration relied on the "redis*" servers for session storage. The "redis*" servers were removed while migrating virtual machines to the eqiad1-r region under the assumption that they were legacy servers from the time when we used redis as storage for the job queue. The fix was to setup the "memc*" servers with both memcached for arbitrary data caching and redis for session storage. If you are interested in the gory details see notes left on https://phabricator.wikimedia.org/T210030.
Bryan
For the record this is largely complete with a few known remaining problems: * MediaWiki does not recognise the new cache instance due to missing config change, the patch is ready but requires a deployer, this does cause problems for things like account creations: https://phabricator.wikimedia.org/T210296 * Restbase service is offline, causing VE to not work: https://phabricator.wikimedia.org/T208101#4768886 * Zotero instance is stuck in the old region as the instance is so old the base image has gone missing (Ubuntu Trusty). This instance was already on its way out though. * Old cache instance is still receiving traffic despite having no DNS records pointing at it: https://phabricator.wikimedia.org/T210214
On Mon, 19 Nov 2018 at 18:11, Andrew Bogott abogott@wikimedia.org wrote:
Due to a lucky arrangement of schedules, we are going to move the deployment-prep project to the new Cloud region this week, starting in a few minutes. *This includes all of the infrastructure behind the beta.wmflabs.org site.*
Various people will be standing by to troubleshoot the outages that result, but for the most part if you see bad behaviors you should disregard or work around them for the time being.
Once everything is moved and semi-stable I will send a followup email, at which point the deployment-prep team will once again become interested in bug reports :)
Sorry for the short notice! With a little luck this should be mostly painless.
-Andrew
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Also: * Puppet is disabled on deployment-mediawiki-07 for unknown reasons https://phabricator.wikimedia.org/T208101#4770736 * Confusion about the status of deployment-redis*, they seem completely broken but have important references? https://phabricator.wikimedia.org/T210301
On Fri, 23 Nov 2018 at 17:15, Alex Monk krenair@gmail.com wrote:
For the record this is largely complete with a few known remaining problems:
- MediaWiki does not recognise the new cache instance due to missing
config change, the patch is ready but requires a deployer, this does cause problems for things like account creations: https://phabricator.wikimedia.org/T210296
- Restbase service is offline, causing VE to not work:
https://phabricator.wikimedia.org/T208101#4768886
- Zotero instance is stuck in the old region as the instance is so old the
base image has gone missing (Ubuntu Trusty). This instance was already on its way out though.
- Old cache instance is still receiving traffic despite having no DNS
records pointing at it: https://phabricator.wikimedia.org/T210214
On Mon, 19 Nov 2018 at 18:11, Andrew Bogott abogott@wikimedia.org wrote:
Due to a lucky arrangement of schedules, we are going to move the deployment-prep project to the new Cloud region this week, starting in a few minutes. *This includes all of the infrastructure behind the beta.wmflabs.org site.*
Various people will be standing by to troubleshoot the outages that result, but for the most part if you see bad behaviors you should disregard or work around them for the time being.
Once everything is moved and semi-stable I will send a followup email, at which point the deployment-prep team will once again become interested in bug reports :)
Sorry for the short notice! With a little luck this should be mostly painless.
-Andrew
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org