On Monday, December 3rd, 2018 at 1700 UTC, we will be rebooting one of the two dumps NFS servers (labstore1006.wikimedia.org <http://labstore1006.wikimedia.org/>). This should cause rising load issues briefly, but should be quick enough that failing over services is likely to not be helpful. We will be failing over the web service before that time and failing it back before rebooting the partner server (labstore1007.wikimedia.org <http://labstore1007.wikimedia.org/>) on Friday, December 7th at 1700 UTC. This should not interrupt services to dumps.wikimedia.org <http://dumps.wikimedia.org/> (the site hosted on these systems) since that should be failed over to the non-rebooting partner.
Brooke Storm
Operations Engineer
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_
ToolsDB will be undergoing maintenance and updates, Tuesday, November 27th at 1730 UTC to 1800 UTC.
Actual outage times should be fairly brief, but during this time the database will be taken offline and the system rebooted. Due to the expected brief nature of the outage and the fact that some tables are not replicated (see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups… <https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups…>), we are not planning on failing over to the replica at this time.
Brooke Storm
Operations Engineer
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_
Hi,
next Tuesday, 2018-11-27 @ 17:30UTC we will reboot the
labnet1001.eqiad.wmnet server for maintenance and security updates.
This server provides virtual networking services for CloudVPS in the
main deployment (the old one, different from the eqiad1 deployment).
We won't be doing any failover prior to the reboot for operative reasons
(we measured the failover downtime is longer than the actual reboot time).
The impact of this brief reboot downtime will be:
* all VMs in the main CloudVPS deployment won't have network connectivity
* ongoing network connections (downloads, uploads) will fail and will
have to be restarted
* cross connectivity between VM instances in the main and eqiad1
deployment won't be possible
Thanks for your understanding, and let us know any issues you may find
after the reboot next week.
Hi,
next Tuesday 2018-11-20 at 17:30 UTC we will be rebooting the OSM
database (part of our data services) for maintenance and security updates.
In concrete the labstore1006.eqiad.wmnet (osmdb.eqiad.wmnet) server will
be rebooted. The other server in the cluster, labstore1007.eqiad.wmnet
has been rebooted already, but we won't be doing any pre-failover for
operative reasons.
Apologies in advance for any inconvenience, and please let us know any
issue you may find after these operations.
Hello!
I need to shut down the tools-dev host in order to move it to a
different server. The downtime will be brief, but in the meantime I
recommend people move their work to a different bastion (e.g.
tools-login.wmflabs.org) in order to avoid interruption.
This will happen on or near 15:00 UTC on Tuesday, 2018-11-20. I'll also
send alerts to sessions on the bastion prior to the shutdown.
-Andrew
Next monday 2018-11-19 we will be rebooting several Cloud VPS
infrastructure servers [0] for maintenance and security updates.
This is just a simple reboot of servers and we don't expect any outage
or major interruptions, but some services may be down briefly:
* Horizon and Wikitech may misbehave
* instance creation/deletion/shutdown, etc
* CI tests may stop running
Apologies in advance for any inconvenience, and please let us know any
issue you may find after these operations.
[0] cloudcontrol1003, cloudservices1003, labcontrol1001, labservices1001
Next week is a short week in the US, so no project moves will happen.
Here is the schedule for project moves in the following week:
Monday, 2018-11-26: collection-alt-renderer, dumps, extdist, glampipe,
google-api-proxy, hound, lizenzhinweisgenerator
Tuesday, 2018-11-27: osmit, pagemigration, paws, petscan, project-proxy,
rcm, shiny-r, social-tools
Wednesday, 2018-11-28: thumbor, wikifactmine, wikidata-dev,
wikidata-federation
Thursday, 2018-11-29: social-tools, wikidata-query, codesearch, dwl,
fa-wp, swift
Friday, 2018-11-30: math, phlogiston, account-creation-assistance,
community-labs-monitoring,
Some context for what this is all about can be found here:
https://phabricator.wikimedia.org/phame/post/view/120/neutron_is_here/
Please let me know if you are involved in one those projects and need to
postpone the move, or schedule a to-the-minute migration window.
- Andrew + the WMCS team
It's Monday, which means it's time to schedule another round of project
migrations. For more info about what this is, consult the link below[1].
Here is the schedule for the next week of moves:
Monday, 2018-11-12: Holiday, no activity :)
Tuesday, 2018-11-13: commtech, design, discourse, gerrit, getstarted
Wednesday, 2018-11-14: library-upgrader, ores-staging, wikiapiary,
wikispeech
Thursday, 2018-11-15: toolsbeta
Friday, 2018-11-16: codesearch, twl, webperf
Please let me know if you are involved in one those projects and need to
postpone the move, or schedule a to-the-minute migration window.
- Andrew + the WMCS team
[1] https://phabricator.wikimedia.org/phame/post/view/120/neutron_is_here/
We'll be shuffling the VMs that host the Quarry service over to a
new corner of the cloud today. During the move the service will be
unavailable and/or behave erratically.
I don't expect the move to take more than an hour. I'll send a
further notice when things are done.
-Andrew + the WMCS team