Hello, all!
Starting today we are kicking off the process to shut down Grid Engine and we want to share the timeline with you.
== Background ==
WMCS made the Grid Engine available as a backend engine for hosting tools on Toolforge - our Platform as a Service(PaaS) offering.
An additional backend engine, Kubernetes, was also made available on Toolforge.
Over time, maintaining and securing the grid has proven to be difficult and making it harder to provide support to the community in other ways because a lot of man-hours of maintenance work is spent on this.
This is mainly due to the fact that there has been no new Grid Engine releases (bug fixes, security patches, or otherwise) since 2016.[0]
Maintenance work on the grid continued because it was widely popular with the community and the Kubernetes offering didn't yet have many grid-like features that contributors came to love.
Once the Kubernetes platform could handle many of the workloads, we started the grid deprecation process by asking maintainers to migrate off the grid.[1]
Over the past year, we've been reaching out to our tool maintainers and working with them to migrate their tools off the Grid to Kubernetes. We have reached out directly to all maintainers with their phabricator ticket IDs.
The latest updates to Build Service[2] have addressed many of the issues that prevented tool maintainers from migrating.
== Initial Timeline ==
The detailed grid shutdown timeline is available on wiki.[3] The important dates have been copied below.
* 14th December, 2023: Any maintainer who has not responded on phabricator will have tools shutdown and crontabs commented out. Please plan to migrate or tell us your plans on phabricator before that date.
* 14th February, 2024: The grid is completely shut down. All tools are stopped.
If you need further clarification or help migrating your tool, don't hesitate to reach out to us on IRC, Telegram, Phabricator[4] or via any of our support channels.[5]
Thank you.
[0]: https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/
[1]: https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation
[2]: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service
[3]: https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation#T...
[4]: https://phabricator.wikimedia.org/project/profile/6135/
[5]: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/About_Toolforge#Communi...
Hi,
The email I received was probably well intended, but felt like a threat to sabotage my tools. The whole project already got me unhappy from the start with tasks, contrary to Phabricator etiquette [1], being assigned to me because I happened to be at the top of the list. I called out the person doing that, but I never got a reply. I guess etiquette is not enforced for staff?
I probably have about 70+ jobs spread over several accounts that are at risk. It feels like you're just dumping work in my lap without any benefit for me as maintainer. I understand you want to make it easier to maintain the infrastructure, but do you have any idea how demotivating this is?
Maarten
[1]: https://www.mediawiki.org/wiki/Bug_management/Phabricator_etiquette
On 30-11-2023 16:41, Seyram Komla Sapaty wrote:
Hello, all!
Starting today we are kicking off the process to shut down Grid Engine and we want to share the timeline with you.
== Background ==
WMCS made the Grid Engine available as a backend engine for hosting tools on Toolforge - our Platform as a Service(PaaS) offering.
An additional backend engine, Kubernetes, was also made available on Toolforge.
Over time, maintaining and securing the grid has proven to be difficult and making it harder to provide support to the community in other ways because a lot of man-hours of maintenance work is spent on this.
This is mainly due to the fact that there has been no new Grid Engine releases (bug fixes, security patches, or otherwise) since 2016.[0]
Maintenance work on the grid continued because it was widely popular with the community and the Kubernetes offering didn't yet have many grid-like features that contributors came to love.
Once the Kubernetes platform could handle many of the workloads, we started the grid deprecation process by asking maintainers to migrate off the grid.[1]
Over the past year, we've been reaching out to our tool maintainers and working with them to migrate their tools off the Grid to Kubernetes. We have reached out directly to all maintainers with their phabricator ticket IDs.
The latest updates to Build Service[2] have addressed many of the issues that prevented tool maintainers from migrating.
== Initial Timeline ==
The detailed grid shutdown timeline is available on wiki.[3] The important dates have been copied below.
- 14th December, 2023: Any maintainer who has not responded on
phabricator will have tools shutdown and crontabs commented out. Please plan to migrate or tell us your plans on phabricator before that date.
- 14th February, 2024: The grid is completely shut down. All tools are
stopped.
If you need further clarification or help migrating your tool, don't hesitate to reach out to us on IRC, Telegram, Phabricator[4] or via any of our support channels.[5]
Thank you.
https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/
https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service
https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation#Timeline
https://wikitech.wikimedia.org/wiki/Portal:Toolforge/About_Toolforge#Communication_and_support
-- Seyram Komla Sapaty Developer Advocate Wikimedia Cloud Services
Cloud-announce mailing list --cloud-announce@lists.wikimedia.org List information:https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.o...
Cloud mailing list --cloud@lists.wikimedia.org List information:https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
I will also note the last time that I attempted to move to the k grid it was about as smooth as sandpaper. Giving us a shutdown date less than 30 days, and during a major holiday timeframe is really not helpful. Planning for this should have been scheduled for a window with higher user availability.
On Mon, Dec 4, 2023 at 2:02 PM Maarten Dammers maarten@mdammers.nl wrote:
Hi,
The email I received was probably well intended, but felt like a threat to sabotage my tools. The whole project already got me unhappy from the start with tasks, contrary to Phabricator etiquette [1], being assigned to me because I happened to be at the top of the list. I called out the person doing that, but I never got a reply. I guess etiquette is not enforced for staff?
I probably have about 70+ jobs spread over several accounts that are at risk. It feels like you're just dumping work in my lap without any benefit for me as maintainer. I understand you want to make it easier to maintain the infrastructure, but do you have any idea how demotivating this is?
Maarten
On 30-11-2023 16:41, Seyram Komla Sapaty wrote:
Hello, all!
Starting today we are kicking off the process to shut down Grid Engine and we want to share the timeline with you.
== Background ==
WMCS made the Grid Engine available as a backend engine for hosting tools on Toolforge - our Platform as a Service(PaaS) offering.
An additional backend engine, Kubernetes, was also made available on Toolforge.
Over time, maintaining and securing the grid has proven to be difficult and making it harder to provide support to the community in other ways because a lot of man-hours of maintenance work is spent on this.
This is mainly due to the fact that there has been no new Grid Engine releases (bug fixes, security patches, or otherwise) since 2016.[0]
Maintenance work on the grid continued because it was widely popular with the community and the Kubernetes offering didn't yet have many grid-like features that contributors came to love.
Once the Kubernetes platform could handle many of the workloads, we started the grid deprecation process by asking maintainers to migrate off the grid.[1]
Over the past year, we've been reaching out to our tool maintainers and working with them to migrate their tools off the Grid to Kubernetes. We have reached out directly to all maintainers with their phabricator ticket IDs.
The latest updates to Build Service[2] have addressed many of the issues that prevented tool maintainers from migrating.
== Initial Timeline ==
The detailed grid shutdown timeline is available on wiki.[3] The important dates have been copied below.
- 14th December, 2023: Any maintainer who has not responded on phabricator
will have tools shutdown and crontabs commented out. Please plan to migrate or tell us your plans on phabricator before that date.
- 14th February, 2024: The grid is completely shut down. All tools are
stopped.
If you need further clarification or help migrating your tool, don't hesitate to reach out to us on IRC, Telegram, Phabricator[4] or via any of our support channels.[5]
Thank you.
-- Seyram Komla Sapaty Developer Advocate Wikimedia Cloud Services
Cloud-announce mailing list -- cloud-announce@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.o...
Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
On Mon, Dec 4, 2023 at 2:11 PM John phoenixoverride@gmail.com wrote:
I will also note the last time that I attempted to move to the k grid it was about as smooth as sandpaper. Giving us a shutdown date less than 30 days, and during a major holiday timeframe is really not helpful. Planning for this should have been scheduled for a window with higher user availability.
John, I'm sorry to hear you had a bad experience in a previous attempt. I appreciate that you've been active in attempting to migrate away from the grid. If you have a specific phabricator ticket from your prior experience you can share, please do so. I hope everything migrates smoothly this time for you, but if not, please let us know so we can help! As an active maintainer, we want to ensure you are able to transition as seamlessly as possible.
I also appreciate the feedback on the timeline. It's important to give yourself and other maintainers a specific date you could plan against. Including the end of the calendar year is helpful for some maintainers who normally do maintenance tasks during this time. At the same time, the goal is not to stress anyone taking time away or celebrating a holiday. For clarity, the shutdown date is 14 Feb 2024, 73 days from the time I'm writing this mail[0]. So you do not need to migrate before the end of the calendar year. Please do enjoy your holidays! Migrations can be undertaken next year. Either way, we do ask for you to communicate your needs and plans for any remaining tools or jobs you maintain. The phabricator ticket is a wonderful place to do so. That will help us in planning and ensuring everyone has been informed.
Thank you,
[0]: https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation#T...
On Mon, Dec 4, 2023 at 2:02 PM Maarten Dammers maarten@mdammers.nl wrote:
Hi,
The email I received was probably well intended, but felt like a threat to sabotage my tools. The whole project already got me unhappy from the start with tasks, contrary to Phabricator etiquette [1], being assigned to me because I happened to be at the top of the list. I called out the person doing that, but I never got a reply. I guess etiquette is not enforced for staff?
I probably have about 70+ jobs spread over several accounts that are at risk. It feels like you're just dumping work in my lap without any benefit for me as maintainer. I understand you want to make it easier to maintain the infrastructure, but do you have any idea how demotivating this is?
Maarten
On 30-11-2023 16:41, Seyram Komla Sapaty wrote:
Hello, all!
Starting today we are kicking off the process to shut down Grid Engine and we want to share the timeline with you.
== Background ==
WMCS made the Grid Engine available as a backend engine for hosting tools on Toolforge - our Platform as a Service(PaaS) offering.
An additional backend engine, Kubernetes, was also made available on Toolforge.
Over time, maintaining and securing the grid has proven to be difficult and making it harder to provide support to the community in other ways because a lot of man-hours of maintenance work is spent on this.
This is mainly due to the fact that there has been no new Grid Engine releases (bug fixes, security patches, or otherwise) since 2016.[0]
Maintenance work on the grid continued because it was widely popular with the community and the Kubernetes offering didn't yet have many grid-like features that contributors came to love.
Once the Kubernetes platform could handle many of the workloads, we started the grid deprecation process by asking maintainers to migrate off the grid.[1]
Over the past year, we've been reaching out to our tool maintainers and working with them to migrate their tools off the Grid to Kubernetes. We have reached out directly to all maintainers with their phabricator ticket IDs.
The latest updates to Build Service[2] have addressed many of the issues that prevented tool maintainers from migrating.
== Initial Timeline ==
The detailed grid shutdown timeline is available on wiki.[3] The important dates have been copied below.
- 14th December, 2023: Any maintainer who has not responded on
phabricator will have tools shutdown and crontabs commented out. Please plan to migrate or tell us your plans on phabricator before that date.
- 14th February, 2024: The grid is completely shut down. All tools are
stopped.
If you need further clarification or help migrating your tool, don't hesitate to reach out to us on IRC, Telegram, Phabricator[4] or via any of our support channels.[5]
Thank you.
-- Seyram Komla Sapaty Developer Advocate Wikimedia Cloud Services
Cloud-announce mailing list -- cloud-announce@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.o...
Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
Hello, this is a provocative approach to migration!
* Clarification Q: The timeline was finalized on November 28, and any tools and cronjobs w/ no response from maintainers will stop working on the same day on December 14th, regardless of how actively they are used? Is there anything that users of those tools can do to delay this? It might be worth posting in places where active /users/ of tools hang out, not just the maintainers, as they will be inconvenienced and may be able to share maintainership where needed.
* Can you share stats on how many tools remain to be migrated, how many will stop in December, and which are the most-used? This phab board has ~460 open tasks https://phabricator.wikimedia.org/project/board/6135/query/open/ , some created this week by the maintainers after receiving a recent ping https://phabricator.wikimedia.org/T352564 , while https://grid-deprecation.toolforge.org/ lists only 447 tools still running on GE -- scores of which seem quite popular.
* When "grid infrastructure is deleted" on March 14, will there be backups of the tools for people who want to migrate them in the future?
* At least Maarten and Albin asked to be unassigned from migration tasks for their tools (but remain assigned). If they can't unassign themselves, and users need to coordinate finding migrators for their tools to keep working: is there some other way to flag in Phab which tools need someone to work on migration? Ideally a way visible from taskboard overviews...
Cordially, Sam
Hi all,
I do appreciate the efforts to keep toolforge running, and that sometimes massive changes are necessary to do this, which has implications for tool maintainers. I also understand that there have to be deadlines at some point, otherwise things will never get finished.
But as I have said on Phabricator (can't find the ticket now), I have been active in moving things to k8s from early on; I have literally rewritten enormous codebases (eg Mix'n'match) in a different language, because the k8s approach does not support the way I did things with grid engine. And while I think the new code is an improvement over the old one, it has taken a huge amount of my time to do this, with little visible improvement for the end user.
K8s, as it's run right now on toolforge, can not - use fire-and-forget jobs, because everything needs a name that you may or may not re-use - has very limited per-tool resources, and the webservice reduces those even further - can not temporarily scale up. Eg I need to process a lot of data once; on grid engine, I could just fire off all the jobs, wait for them to complete, re-run the failed ones etc. This is simply not possible on k8s as it is. - Even the current Wikitech documentation still uses grid engine, eg https://wikitech.wikimedia.org/wiki/Help:Toolforge/Rust (I have tried, and failed, to get that running on k8s)
I know there is a technical reason to limit per-tool k8s resources so much (something about running on a single VM), but IMHO there needs to be a lot more flexibility; give the user the option to scale up tool resources without having to go through Phab bureaucracy, run jobs on a large (shared) k8s pool, auto-generate job names for fire-and-forget jobs, something.
As for the deadline(s) given here, as I stated above, I started quite early on this, and invested a lot of work. Yet, I still have tools listed on https://grid-deprecation.toolforge.org/ (which was not linked from the original mail, despite being the main link people need IMHO), so I do feel the pressure myself. Maybe you could disable grid engine for all tools NOT on that page, to ensure no one restarts with grid engine, and leave a smaller pool running for the remaining tools, to make resources available for k8s while giving the remaining tool users a bit more time?
Apologies for long rant, Magnus
On Tue, Dec 5, 2023 at 7:13 PM meta.sj@gmail.com wrote:
Hello, this is a provocative approach to migration!
- Clarification Q: The timeline was finalized on November 28, and any
tools and cronjobs w/ no response from maintainers will stop working on the same day on December 14th, regardless of how actively they are used? Is there anything that users of those tools can do to delay this? It might be worth posting in places where active /users/ of tools hang out, not just the maintainers, as they will be inconvenienced and may be able to share maintainership where needed.
- Can you share stats on how many tools remain to be migrated, how many
will stop in December, and which are the most-used? This phab board has ~460 open tasks https://phabricator.wikimedia.org/project/board/6135/query/open/ , some created this week by the maintainers after receiving a recent ping https://phabricator.wikimedia.org/T352564 , while https://grid-deprecation.toolforge.org/ lists only 447 tools still running on GE -- scores of which seem quite popular.
- When "grid infrastructure is deleted" on March 14, will there be backups
of the tools for people who want to migrate them in the future?
- At least Maarten and Albin asked to be unassigned from migration tasks
for their tools (but remain assigned). If they can't unassign themselves, and users need to coordinate finding migrators for their tools to keep working: is there some other way to flag in Phab which tools need someone to work on migration? Ideally a way visible from taskboard overviews...
Cordially, Sam _______________________________________________ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
Hi,
On 12/6/23 03:59, Magnus Manske via Cloud wrote:
Hi all,
I do appreciate the efforts to keep toolforge running, and that sometimes massive changes are necessary to do this, which has implications for tool maintainers.
+1, I am not sure people fully appreciate how massive of a change this is, the grid engine is one of the few remaining parts in Toolforge that actually predates it. River announced[1] SGE support for the Toolserver back in September 2009 and then in January 2013 everyone was given exactly one month(!!) to move their bots to SGE[2].
So it's a real milestone on the infrastructure side and maintainers to make it this far in getting rid of it, but it also means there's 10+ years of user familiarity, expectations and inertia towards the grid.
K8s, as it's run right now on toolforge, can not
- ...
- has very limited per-tool resources, and the webservice reduces those
even further
Just FYI if you weren't aware, the default quotas were recently raised to 8CPU + 8GB total, with a max of 3CPU + 6GB per pod. (This is also something I ran into.)
- Even the current Wikitech documentation still uses grid engine, eg
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Rust https://wikitech.wikimedia.org/wiki/Help:Toolforge/Rust (I have tried, and failed, to get that running on k8s)
Sorry, this was on me, I had been pinged a while back to update it but it took me a while to figure it out for my own tools and I was still tweaking my own setup. I've updated it now, so the main "Rust" wiki page now explains how to use the jobs framework, but I still need to update the "My first Rust tool" guide. It's kind of cumbersome because k8s doesn't spawn a login shell so we have to do it manually but I guess that's for the best? If there's any other Rust-related stuff I can help with, please let me know.
Anyways, I feel in a similar boat overall, I've mostly spent the last two weeks just taking stock of my tools and rewriting some and shutting down others. I found it useful to spend a bit of time homogenizing how my tools are laid out so I could just write an ansible playbook[3] to deploy all of them (I plan to explain this in a forthcoming blog post, very soon now) in a similar fashion and apply multi-tool changes easily too.
I've already asked for the February extension for at least one of my tools, I think it's pretty reasonable for you to ask as well. I am not sure how long the lifeline can last though, the Debian Buster LTS end-of-life is coming up in June 2024 and I'm sure there's other considerations too.
[1] https://lists.wikimedia.org/hyperkitty/list/toolserver-l@lists.wikimedia.org... [2] https://lists.wikimedia.org/hyperkitty/list/toolserver-l@lists.wikimedia.org... [3] https://gitlab.wikimedia.org/legoktm/toolforge-ansible
-- Legoktm
On Thu, Dec 7, 2023 at 7:26 AM Kunal Mehta legoktm@debian.org wrote:
Just FYI if you weren't aware, the default quotas were recently raised to 8CPU + 8GB total, with a max of 3CPU + 6GB per pod. (This is also something I ran into.)
Thanks, that will help! I was unaware.
- Even the current Wikitech documentation still uses grid engine, eg
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Rust https://wikitech.wikimedia.org/wiki/Help:Toolforge/Rust (I have
tried,
and failed, to get that running on k8s)
Sorry, this was on me, I had been pinged a while back to update it but it took me a while to figure it out for my own tools and I was still tweaking my own setup. I've updated it now, so the main "Rust" wiki page now explains how to use the jobs framework, but I still need to update the "My first Rust tool" guide. It's kind of cumbersome because k8s doesn't spawn a login shell so we have to do it manually but I guess that's for the best? If there's any other Rust-related stuff I can help with, please let me know.
Thanks for this!
Magnus
On Tue, 5 Dec 2023 at 19:13, meta.sj@gmail.com wrote:
- When "grid infrastructure is deleted" on March 14, will there be backups
of the tools for people who want to migrate them in the future?
The "grid infrastructure" is different than the "tools". I expect the tools would still be there, they would just no longer work if they did so using the grid.
PS: Thanks for the https://grid-deprecation.toolforge.org link, Magnus. I had no idea it existed.