Hello, all!
This email contains valuable information about the Toolforge service.
Starting today, we're initiating a process to migrate away from Debian Stretch to Debian Buster for all of Toolforge servers, and the most affected piece is the Grid Engine backend in particular.
Debian Stretch was released in June 2017, and long term support for it (including security updates) will cease in June 2022. We need to shut down all Stretch hosts before the end of support date to ensure that Toolforge remains a secure platform. This migration will take several months because many people still use the Stretch hosts and our users are working on tools in their spare time.
You should be aware that our ultimate goal is to deprecate Grid Engine entirely and replace it with Kubernetes. Read below for more information on this.
== Initial timeline == Subject to change, see Wikitech[1] for living timeline.
* 2022-02-15: Availability of Debian Buster grid announced to community * 2022-03-21: Weekly reminders via email to tool maintainers for tools still running on Stretch * Week of 2022-04-21: ** Daily reminders via email to tool maintainers for tools still running on Stretch ** Switch login.toolforge.org to point to Buster bastion * Week of 2022-05-02: Evaluate migration status and formulate plan for final shutdown of Stretch grid * Week of 2022-05-21: Shut down Stretch grid
== What is changing? == * New bastion hosts running Debian Buster with connectivity to the new job grid * New versions of PHP, Python3, and other language runtimes * New versions of various support libraries
== What should I do? == You should migrate your Toolforge tool to a newer environment. You have two options: * migrate from Toolforge Stretch Grid Engine to Toolforge Kubernetes[3]. * migrate from Toolforge Stretch Grid Engine to Toolforge Buster Grid Engine.
The Cloud Services team has created the Toolforge Stretch deprecation[0] page on wikitech.wikimedia.org to document basic steps needed to move web services, cron jobs, and continuous jobs from the old Stretch grid to the new Buster grid. That page also provides more details on the language runtime and library version changes and will provide answers to common problems people encounter as we find them. If the answer to your problem isn't on the wiki, ask for help using any of our communication channels[2].
We encourage you to move to Kubernetes today if you can, see below for more details.
For those who can't migrate to Kubernetes, the Debian Buster grid should be adopted within the next three months.
== A note on the future of Toolforge, the Grid and Kubernetes == As of today, Toolforge is powered by both Grid Engine and Kubernetes. For a number of reasons, we have decided to deprecate Grid Engine and replace all of its functions with Kubernetes. We're not yet ready to offer all grid-like features on Kubernetes, but we're working on it. As soon as we are able, we will begin the process of migrating the workloads and shutting down the grid. This is something we hope to do between 2022 and 2023.
We share this information to encourage you to evaluate migrating your tool away from Grid Engine to Kubernetes.
One of the most prominent missing features on Kubernetes was a friendly command line interface to schedule jobs (like jsub). We've been working on that, and have a beta-level interface that you can try today: the Toolforge jobs framework [4].
[0]: https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation [1]: https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Timel... [2]: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/About_Toolforge#Communi... [3]: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes [4]: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs-Framework
Thanks.
From my perspective of a Toolforge user, one of the issues I see is that it's often not clear how to map the "friendly command line interface" into concepts I already understand about the lower level tools.
For example, the webservice script does some useful stuff. But, it wasn't clear exactly what it was doing, i.e. there was a lot of magic happening. While the magic is certainly an integral part of hiding the low-level details, it also obfuscates things. Reading the webservice script wasn't much help; it's long and complicated, and mixes grid and k8s functionality in a way that further hides what's actually going on.
Anyway, all I'm really asking is that as the docs get written for the "friendly command line interface", you also include some explanation of what's happening behind the scenes. For example, maybe have a --verbose option to all the tools which makes it print all the back end commands it's executing, so
webservice --backend=kubernetes python3.7 restart
might print:
kubectl exec -i -tshell-1645020371 --container main-app -- /bin/bash
And then somebody who already understands kubectl would instantly understand what's happening. It's not hard to guess the basic gist of what it must be doing, but having the details confessed eliminates any doubt, enhancing comprehension.
As another example, it took me a little bit to figure out that the "become" command doesn't do anything more magic than run sudo with a little sanity checking wrapped around it. Fortunately, that script is simple enough that once I looked at it, it was obvious what it was doing. But other parts of the "friendly command line interface" are rather more opaque.
On Feb 15, 2022, at 11:42 AM, Seyram Komla Sapaty ssapaty@wikimedia.org wrote:
One of the most prominent missing features on Kubernetes was a friendly command line interface to schedule jobs (like jsub). We've been working on that, and have a beta-level interface that you can try today: the Toolforge jobs framework [4].
wikitech-l@lists.wikimedia.org