labsdb1001.eqiad.wmnet (aka c1.labsdb) will be rebooted at 2017-10-30
14:30 UTC for kernel updates
(<https://phabricator.wikimedia.org/T168584>).
Normal usage of the *.labsdb databases should experience only limited
interruption as DNS is changed to point to the labsdb1003.eqiad.wmnet
(aka c3.labsdb). The c1.labsdb service name will *not* be updated
however, so tools hardcoded to that service name will be interrupted
until the reboot is complete.
There is a possibility of catastrophic hardware failure in this
reboot. There will be no way to recover the server or the data it
currently hosts if that happens. Tools that are hosting self-created
data on c1.labsdb *will* lose that data if there is hardware failure.
If you are unsure if your tool is hosting data on c1.labsdb, you can
check at <https://tools.wmflabs.org/tool-db-usage/>.
This reboot is an intermediate step before the complete shutdown of
the server on Wednesday 2017-12-13. See
<https://wikitech.wikimedia.org/wiki/Wiki_Replica_c1_and_c3_shutdown>
for more information.
Bryan (on behalf of the Wikimedia Cloud Services and DBA teams)
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA
irc: bd808 v:415.839.6885 x6855
The labsdb1001.eqiad.wmnet (aka c1.labsdb) and labsdb1003.eqiad.wmnet
(aka c3.labsdb) servers are being shutdown and permanently removed
from service on Wednesday 2017-12-13.
TL;DR
* Change your tools and scripts to use:
- "*.web.db.svc.eqiad.wmflabs" (real-time response needed)
- "*.analytics.db.svc.eqiad.wmflabs" (batch jobs; long queries)
* Replace "*" with either a shard name (e.g. s1) or a wikidb name
(e.g. enwiki).
* The new servers do not support user created databases/tables because
replication can't be guaranteed. See T156869 and below for more
information.
* Migrate your user created tables to tools.db.svc.eqiad.wmflabs
(also known as tools.labsdb) and JOIN via application space logic
rather than in-process in the database.
What is changing?
* Week of 2017-10-30 to 2017-11-03 (exact date to be determined)
** Reboot labsdb1001.eqiad.wmnet (aka c1.labsdb) for kernel updates
** There is a possibility of catastrophic hardware failure in this
reboot. There will be no way to recover the server or the data it
currently hosts if that happens.
* Week of 2017-11-06 to 2017-11-10 (exact date to be determined)
** Reboot labsdb1003.eqiad.wmnet (aka c3.labsdb) for kernel updates
** There is a possibility of catastrophic hardware failure in this
reboot. There will be no way to recover the server or the data it
currently hosts if that happens.
* Wednesday 2017-12-13
* "*.labsdb" service names switched to point at
"*.web.db.svc.eqiad.wmflabs" equivalents.
* User created tables will not be allowed on the new servers
"c1.labsdb" and "c3.labsdb" point to.
* labsdb1001.eqiad.wmnet removed from service.
* labsdb1003.eqiad.wmnet removed from service.
Why are we doing this?
See <https://wikitech.wikimedia.org/wiki/Wiki_Replica_c1_and_c3_shutdown>
and <https://phabricator.wikimedia.org/T142807> for a more complete
description of the reasons for these changes.
Bryan (on behalf of the Wikimedia Cloud Services and DBA teams)
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA
irc: bd808 v:415.839.6885 x6855
Tool-forge users can ignore this email, it only concerns VPS
project owners.
Long ago, the Wikimedia Operations team made the decision to phase
out use of Ubuntu servers in favor of Debian. It's a long, slow process
that is still ongoing, but in production Trusty is running on an
ever-shrinking minority of our servers.
As Trusty becomes more of an odd duck in production, it grows
harder to support in Cloud Services as well. Right now we have no
planned timeline for phasing out Trusty instances (there are 238 of
them!) but in anticipation of that phase-out we'd like to discourage the
addition of new Trusty hosts to the cloud.
Step one[1] is to prevent people from creating new Trusty images
unless they really, really need them. We would like to remove Trusty
from the default available list of base images and make it available for
new VMs only via special request on phabricator. The questions for you are:
1) Would that change make your life a lot harder?
2) If yes, can you name a date after which it /won't/ make your life harder?
If the loss of Trusty doesn't worry you, feel free to ignore this
email. In the event of a silent (or relatively quiet) response, I'll
pull Trusty from the default image list sometime in the next few weeks.
- Andrew (+ the rest of the Cloud team)
[1] https://phabricator.wikimedia.org/T161899
In order to rebuild a server of questionable stability, I'm going to
move the following instances on Wednesday:
|+--------------------------+---------------------+--------+||
||| Name | Tenant ID | Status | ||
||+--------------------------+---------------------+--------+||
||| cindy | pluggableauth | ACTIVE | ||
||| deployment-kafka-jumbo-1 | deployment-prep | ACTIVE | ||
||| oidc-google | pluggableauth | ACTIVE | ||
||| proton-staging | reading-web-staging | ACTIVE | ||
||| search-jessie | search | ACTIVE | ||
||| smtp-test1 | project-smtp | ACTIVE | ||
||| suggestbot-prod | suggestbot | ACTIVE | ||
||| twlight-prod | twl | ACTIVE | ||
||| twlight-staging | twl | ACTIVE | ||||
||| wikibrain-embeddings-02 | wikibrain | ACTIVE | ||
||| wikikids | wmam | ACTIVE | ||
||| zim-proto | mobile | ACTIVE | ||
||+--------------------------+---------------------+--------+|
Migration will cause the affected instances to be offline for some time
(potentially more than an hour depending on the size of the instance)
and rebooted. If you need me work on your server at a particular time
of day, or need a stay of execution, please let me know. Otherwise I'll
start going down the list at the beginning of my workday on Wednesday,
around 14:00 UTC.
Sorry for the inconvenience!
-Andrew