Toolforge just now suffered a partial grid-engine outage. All grid
services should be back to normal as of this email; some k8s services
may misbehave for the next hour or two.
NFS misbehavior resulted in grid control mechanisms timing out, which
meant that no new jobs could be scheduled for the last 90 minutes or so.
We've rebooted the NFS server which has resolved the primary issues;
however, rebooting NFS is itself disruptive and may have caused other
jobs (both on the grid or in k8s) to fail.
We're currently rebooting all k8s worker nodes, which will take a couple
of hours to complete. During those reboots some jobs may fail or
experience surprise rescheduling.
Sorry for the outage! If your grid job was disrupted by this outage,
please take this as a sign to migrate your service off the grid! Details
about the grid shutdown can be found here:
https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation#…
-Andrew (+ Taavi who did most of the actual recovery work)
Hello!
After our initial announcement of the Grid Engine shutdown timeline[0],
some of you raised concerns about losing your tools.
We want to address those apprehensions while hopefully providing
reassurance. No tools will be deleted until the grid engine shutdown date
on 14 February 2023. However, for tools with unreachable maintainers, an
outage will happen starting on 14 December 2023[1]. This is intended to
raise awareness for users or maintainers who have not otherwise been
reached. A list of these tools can be found here[2]. If you are a
maintainer or a user of a tool in this list, comment on the associated
phabricator ticket with migration plans or a request for more support. The
goal is to have a plan for all tools running on the grid. We want all
actively used tools to be migrated, and will help support users of critical
tools without a maintainer. Thanks for your help in identifying and
migrating those tools you maintain and depend on.
We acknowledge that the timeline might seem tight, and we want to clarify
that our approach is to make this process as seamless as possible. We have
been actively engaging with tool maintainers over the past year, and we
genuinely appreciate the efforts many of you have already made to migrate
your tools to Kubernetes.
We will continue to work closely with maintainers who might need additional
time or assistance.
If for any reason you have not received a phabricator ticket for your tool,
please reach out.
The phabricator ticket is a good place to communicate your needs and plans
for any remaining tools or jobs.
This will help us further organize and plan this process.
Our primary goal is to support you through this transition. If you have
further concerns about the deadline or if you need assistance with the
migration process, please don't hesitate to reach out to us. We are
available on IRC, Telegram, Phabricator[3], and through our other support
channels[4].
Do you still have concerns or questions? Please let us know. We want to do
this together with you, in a way which makes sense to everyone. We’re very
grateful for all the hard work you do, and our only goal here is to secure
the future of tools in the Wikimedia sphere, not to make your lives more
difficult.
Thank you!
[0]:
https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.…
[1]:
https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation#…
[2]:
https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation/…
[3]: https://phabricator.wikimedia.org/project/board/6135/
[4]:
https://wikitech.wikimedia.org/wiki/Portal:Toolforge/About_Toolforge#Commun…
--
Seyram Komla Sapaty
Developer Advocate
Wikimedia Cloud Services
We are experiencing networking issues on Cloud VPS, which means
currently no traffic is getting in or out of Cloud VPS. Toolforge is
also down.
We are working on it and progress is tracked at
https://phabricator.wikimedia.org/T352539
We will send an update when things are working again, thanks for your patience.
--
Francesco Negri (he/him) -- IRC: dhinus
Site Reliability Engineer, Cloud Services team
Wikimedia Foundation