I will be upgrading the cloud-vps openstack install on Thursday,
beginning around 16:00 UTC. Here's what to expect:
- Intermittent Horizon and API downtime (maybe an hour or two total)
- Inability to schedule new VMs (also for an hour or two)
Toolforge users will be unaffected by this outage. Existing, running
services and VMs on cloud-vps should also be unaffected.
In case you want to follow along at home, this is tracked as
https://phabricator.wikimedia.org/T356287
-Andrew + the WMCS team
I've swapped the secondary dev.toolforge.org bastion to a new server
running Debian 12. As usual, the new SSH fingerprints have been
published on Wikitech[0].
The new bastion no longer has the full list of packages installed that
were required for Grid Engine usage. If there is a package missing
from the new bastion that you would find useful, please file a new
Phabricator task in the Toolforge project[1].
If there are no major issues found I will also swap the main
login.toolforge.org bastion to a new server in a few days. I'll send a
separate announcement when that happens.
[0]: https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/dev.toolforge.org
[1]: https://phabricator.wikimedia.org/tag/toolforge/
Taavi
--
Taavi Väänänen (he/him)
Site Reliability Engineer, Cloud Services
Wikimedia Foundation
TL;DR: If you start to notice new or noisy puppet failures on your VMs,
please notify me directly or open a phab ticket and assign it to me
(Andrew).
==
What's happening:
Over the last few weeks I've been upgrading cloud-vps puppet servers to
newer builds that support the latest version of the puppet config
language, version 7. That's done for almost all cases; there are a few
project-local puppetmasters that I've been nervous about messing with
directly; in those cases I've opened phabricator tickets and assigned
them to project admins. For clarity, I've been using 'puppetserver'
terminology for new servers, whereas older servers were generally called
'puppetmasters.' [0]
Now that most servers are upgraded, it's time for me to flip the setting
that causes them to actually use the version 7 parser and compiler. In
almost all cases this will be backwards-compatible with the existing
catalogs but we may turn up a few edge cases that require repair.
What you need to do:
If you have one of those phab tickets about puppetservers open for your
project, please respond on the ticket so I know you're there and know
what your plan is.
All other users, please reach out to me if you start seeing new or
surprising puppet failures and I'll help sort out the transition.
-Andrew
[0] https://wikitech.wikimedia.org/wiki/Help:Project_puppetserver
Hi all!
This is to let you know that Toolforge continuous jobs now support
health-checks!
To use it you need to provide `--health-check-script ./script.sh` while
creating
your job. You can also provide the script as a string like this
`--health-check-script "cat /etc/os-release"`. Toolforge will periodically
attempt
to execute your health-check script inside your running job and will
restart
your job if the script completes with an exit code of 1.
Note: if you use a script file for health-check, do not forget to make the
file
executable (chmod u+x script.sh). If toolforge can't execute your
health-check
script, your job will never start.
Also a reminder that you can find this and smaller user-facing updates about
the Toolforge platform features here:
https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Changelog
Original task: https://phabricator.wikimedia.org/T335592
--
Ndibe Raymond Olisaemeka
Software Engineer - Technical Engagement
Wikimedia Foundation <https://wikimediafoundation.org/>
<https://wikimediafoundation.org>
Hi,
Toolforge's Harbor instance (image registry) will be down briefly for a
version upgrade from 2.9.0 to 2.10.1 tomorrow Thursday 4 April at 9:00 UTC.
https://phabricator.wikimedia.org/T354507
This should not affect any tools that are not using the new build service,
nor any tools that are already running.
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service
If you are using the builds service, you will not be able to run any new
builds, or start a job or a webservice from an image built with the build
service while Harbor is down. The outage is expected to last a few minutes.
We will send an update before starting maintenance work, and once
everything is back up and running.
Cheers,
--
Slavina Stefanova (she/her)
Software Engineer | Developer Experience
Wikimedia Foundation