Next week (Tuesday, 2022-04-19 15:00 UTC) we will be upgrading the
operating system that hosts the shared Toolsdb servers. This upgrade may
take an hour or more, during which time the databases will not be available.
This outage will be VERY DISRUPTIVE to many toolforge tools, as all
database access will fail during the upgrade. Toolforge users may want
to disable tools before the outage and/or check in to verify proper
recovery after service is restored.
There is likely to be a similar (or longer) outage in subsequent weeks
as we also need to upgrade the database servers themselves. Toolsdb has
grown to an ungainly size and can't be easily handled using standard
rolling upgrade procedures; the WMCS team is in ongoing discussions
about long-term solutions for this issue. In the meantime you can help
us out by engaging in periodic cleanup of your database usage and
dumping or dropping data that's no longer of use.
-Andrew + the WMCS team
Hi there,
Today 2022-04-06 we're performing some network maintenance operations on
Cloud VPS that could affect all cloud egress/ingress traffic, including
Toolforge. The cuts, if noticeable, should last a few minutes at most.
Some operations were also conducted yesterday (without this email
notice), and some unexpected hiccups occurred. That's why the email today.
regards.
--
Arturo Borrero Gonzalez
Site Reliability Engineer
Wikimedia Cloud Services
Wikimedia Foundation
PAWS upgrading to pywikibot 7.0.0 there are some breaking changes:
Support of Python 3.5.0 - 3.5.2 has been dropped (T286867)
generate_user_files.py, generate_user_files.py, shell.py and version.py
were moved to pywikibot/scripts and must be used with pwb wrapper script
With some more in the Code cleanups section of the changelog:
https://doc.wikimedia.org/pywikibot/stable/changelog.html
--
*Vivian Rook (They/Them)*
Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
In about an hour (at 15:00 UTC today) we'll be upgrading the networking
servers for cloud-vps. This may cause brief networking interruptions for
both cloud-vps and toolforge.
No action should be needed on your part.
-Andrew + the WMCS team
Hello, all!
This is a follow-up on our earlier announcement[0] of the above.
Thanks to those who have already migrated their tool(s) from Debian Stretch
grid or are
in the process of doing this.
At the start of this process, there were 867 tools running on Stretch grid.
The current number is 821.
=== Recap ===
We are migrating away from Debian Stretch[1] to Debian Buster for all of
Toolforge servers,
and the most affected piece is the Grid Engine backend in particular.
We need to shut down all Stretch hosts before the end of support date to
ensure that
Toolforge remains a secure platform. This migration will take several
months because many people still use the Stretch hosts and our users
are working on tools in their spare time.
== What should I do? ==
You should migrate your Toolforge tool to a newer environment.
You have two options:
* migrate from Toolforge Stretch Grid Engine to Toolforge Kubernetes[2].
* migrate from Toolforge Stretch Grid Engine to Toolforge Buster Grid
Engine.[3]
== Timeline ==
* 2022-02-15: Availability of Debian Buster grid announced to community -
DONE
* 2022-03-21: Weekly reminders via email to tool maintainers for tools
still running on Stretch - IN PROGRESS
* Week of 2022-04-21:
** Daily reminders via email to tool maintainers for tools still running on
Stretch
** Switch login.toolforge.org to point to Buster bastion
* Week of 2022-05-02: Evaluate migration status and formulate plan for
final shutdown of Stretch grid
* Week of 2022-05-21: Shutdown Stretch grid
We thank all of you for your support during this migration process.
You can always reach out via any of our communication channels[4]
[0]
https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.…
[1] https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation
[2]
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Grid_Engi…
[3]
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Move…
[4]
https://wikitech.wikimedia.org/wiki/Portal:Toolforge/About_Toolforge#Commun…
Thanks.
--
Seyram Komla Sapaty
Developer Advocate
Wikimedia Cloud Services
In a few hours we'll be replacing the existing cloud-vps bastions with
new systems running Debian Bullseye.
Because this is a DNS change, existing bastion sessions should not be
interrupted. New connections will produce fingerprint warnings that will
require you to update your .ssh/known_hosts. Here are the fingerprints
for the new systems:
primary.bastion.wmcloud.org, eqiad1.bastion.wmcloud.org,
bastion.wmcloud.org:
ED25519 key fingerprint is
SHA256:QlZONtScYR4O5jGnrmKRhWVF9lJE+aReENpHXqeOL/4
secondary.bastion.wmcloud.org:
ED25519 key fingerprint is
SHA256:tRgnLMmISSuByzzeX8yXWcdFKjZad8Hdy6Y7E6jgaGI
-Andrew + the WMCS team
Hello, all!
This email contains valuable information about the Toolforge service.
Starting today, we're initiating a process to migrate away from Debian
Stretch to Debian Buster for all of Toolforge servers, and the most
affected piece is the Grid Engine backend in particular.
Debian Stretch was released in June 2017, and long term support for it
(including security updates) will cease in June 2022. We need to shut
down all Stretch hosts before the end of support date to ensure that
Toolforge remains a secure platform. This migration will take several
months because many people still use the Stretch hosts and our users
are working on tools in their spare time.
You should be aware that our ultimate goal is to deprecate Grid Engine
entirely and replace it with Kubernetes. Read below for more information
on this.
== Initial timeline ==
Subject to change, see Wikitech[1] for living timeline.
* 2022-02-15: Availability of Debian Buster grid announced to community
* 2022-03-21: Weekly reminders via email to tool maintainers for tools
still running on Stretch
* Week of 2022-04-21:
** Daily reminders via email to tool maintainers for tools still running on
Stretch
** Switch login.toolforge.org to point to Buster bastion
* Week of 2022-05-02: Evaluate migration status and formulate plan for
final shutdown of Stretch grid
* Week of 2022-05-21: Shut down Stretch grid
== What is changing? ==
* New bastion hosts running Debian Buster with connectivity to the new job
grid
* New versions of PHP, Python3, and other language runtimes
* New versions of various support libraries
== What should I do? ==
You should migrate your Toolforge tool to a newer environment.
You have two options:
* migrate from Toolforge Stretch Grid Engine to Toolforge Kubernetes[3].
* migrate from Toolforge Stretch Grid Engine to Toolforge Buster Grid
Engine.
The Cloud Services team has created the Toolforge Stretch
deprecation[0] page on wikitech.wikimedia.org to document basic steps
needed to move web services, cron jobs, and continuous jobs from the
old Stretch grid to the new Buster grid. That page also provides more
details on the language runtime and library version changes and will
provide answers to common problems people encounter as we find them.
If the answer to your problem isn't on the wiki, ask for help using
any of our communication channels[2].
We encourage you to move to Kubernetes today if you can, see below for
more details.
For those who can't migrate to Kubernetes, the Debian Buster grid should
be adopted within the next three months.
== A note on the future of Toolforge, the Grid and Kubernetes ==
As of today, Toolforge is powered by both Grid Engine and Kubernetes.
For a number of reasons, we have decided to deprecate Grid Engine and
replace all of its functions with Kubernetes. We're not yet ready to
offer all grid-like features on Kubernetes, but we're working on it.
As soon as we are able, we will begin the process of migrating the
workloads and shutting down the grid. This is something we hope to do
between 2022 and 2023.
We share this information to encourage you to evaluate migrating your
tool away from Grid Engine to Kubernetes.
One of the most prominent missing features on Kubernetes was a friendly
command line interface to schedule jobs (like jsub). We've been working
on that, and have a beta-level interface that you can try today: the
Toolforge jobs framework [4].
[0]: https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation
[1]:
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Time…
[2]:
https://wikitech.wikimedia.org/wiki/Portal:Toolforge/About_Toolforge#Commun…
[3]: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes
[4]: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs-Framework
Thanks.
--
Seyram Komla Sapaty
Developer Advocate
Wikimedia Cloud Services
Hello, all!
We are in the process of re-engineering and virtualizing[0] the NFS
service provided to Toolforge and VMs. The transition will be rocky and
involve some service interruption... I'm still running tests to
determine exactly host much disruption will be required.
The first volume that I'd like to replace is 'scratch,' typically
mounted as /mnt/nfs/secondary-scratch. I'm seeking feedback about how
vital scratch uptime is to your current workflow, and how disruptive it
would be to lose data there.
If you have a project or tool that uses scratch, please respond with
your thoughts! My preference would be to wipe out all existing data on
scratch and also subject users to several unannounced periods of
downtime, but I also don't want anyone to suffer. If you have
important/persistent data on that volume then the WMCS team will work
with you to migrate that data somewhere safer, and if you have an
important workflow that will break due to Scratch downtime then I'll
work harder on devising a low-impact roll-out.
Thank you!
-Andrew
[0] https://phabricator.wikimedia.org/T291405