[Labs-l] Labs datacenter migration timeline

Andrew Bogott abogott at wikimedia.org
Wed Feb 19 17:54:35 UTC 2014

     In the last few weeks we've made great strides with the new labs 
cluster in our eqiad datacenter.  Things have been going well enough 
that we're ready to announce a tentative migration schedule.  As always, 
I want to reiterate that this plan does not involve the destruction of 
any instances or data.  If you do nothing then your projects will suffer 
downtime, but none of your work should be entirely lost.

     Toollabs users are in luck -- Coren will be building a new tools 
cluster and migrating tools one by one.  If your tool is currently 
running on the grid engine and is generally able to survive restarts, 
transition to the new datacenter shouldn't require any action on your part.

     As for those of you managing your own projects:  please have 
another look at the 'my resources' page 
(https://wikitech.wikimedia.org/wiki/Special:NovaResources) and make 
sure that all of your instances are running puppet[1] and that none of 
them are superfluous.

     In the first week of March (most likely on the 3rd), the exodus 
will begin.  New instance creationg will be disabled in pmtpa, and you 
will have two options:

- self-migrate/rebuild:
     Wikitech will allow you to create new instances in eqiad and 
transfer files between your instances and shared storage.  This process 
will provide you with maximum control and minimum downtime; it will also 
be an opportunity to verify that your project is robust and your 
instances can be easily rebuild.

- assisted-migrate:
     Coren or I can schedule[2] downtime for your projects, and migrate 
the raw VMs and data to eqiad.  This will entail considerable downtime.  
Instances /should/ survive the operation, but there may be a few 
unfortunate casualties.

     We've budgeted two weeks for the gentle part of the migration. If, 
by mid March (2012-03-18), you project is still humming along in pmtpa 
without comment, Labs staff will start shutting down instances and 
copying them over sporadically and unpredictably, at which point your 
VMs will arrive in an intact but SHUTOFF state in eqiad.  This will, 
obviously, entail a fair bit of downtime, so if your project is in 
active use I encourage you to self-migrate or otherwise coordinate with us.

[1] self-hosted puppet instances are a special case here, to be 
discussed in a future email.  In the short run, my advice is to make 
sure that your puppet repo is up to date and that puppet is running cleanly.

[2] via some manner of calendaring system to be determined later.

