The Toolforge Jobs Framework recently switched to its own storage layer [0]
for tracking jobs,
but we temporarily retained the ability to reconcile jobs created directly
via k8s CLIs/APIs
with Toolforge.
This temporary measure will be removed on June 20, 2026. After that date,
"toolforge jobs"
commands will only see jobs created through the Toolforge CLI or API. Jobs
created via k8s
CLIs/APIs will continue to run as Kubernetes objects — they just won't show
in "toolforge jobs"
output.
If you mix k8s CLIs/APIs and "toolforge jobs" to manage your workloads,
please refactor your
workflows to use "toolforge jobs" CLI/API instead of k8s CLIs/APIs before
June 20.
If you only use "toolforge jobs" commands, no action is needed.
For questions, reach out on IRC (#wikimedia-cloud), this list, or the Tools
Platform Team
phabricator tag [1].
[0] https://phabricator.wikimedia.org/T359650
[1] https://phabricator.wikimedia.org/tag/tools-platform-team/
Hello everyone,
As part of the ongoing upgrade of our Search infrastructure from OpenSearch
1.3 → 2.19 → 3.5, we recently upgraded the first production cluster (
cloudelastic).
Following the upgrade, the cluster has become unstable. In particular, some
indices have entered a read-only (red) state, which may impact search
functionality relying on this cluster.
We are currently investigating the issue from multiple angles. Early
indications suggest this may be related to readahead settings in
combination with the underlying storage hardware, but this is not yet
confirmed.
Tracking task: https://phabricator.wikimedia.org/T424852
Our current focus is on stabilizing the cluster and restoring full
functionality. We will share updates as we learn more. At this point, we
expect to have the cluster stabilized again by the end of the week.
If you are observing issues that you suspect are related, please add
details to the Phabricator task or reach out to the Search team.
Thanks for your patience,
Peter
--
Peter Fischer (he/him)
Senior Software Engineer, Search Platform
Wikimedia Foundation
Hello all,
in order to optimize Toolforge resources utilization we will be
progressively changing the default memory requests for webservice tools
from 256MB to 64MB. Memory and CPU requests are scheduler hints for
Kubernetes on how to allocate tools to worker nodes; the default memory
limit (i.e. enforced maximum memory usage) stays unchanged at 512MB. There
is no expected impact to tools using more memory than they request.
The first phase will involve changing requests from 256MB to 128MB on Wed
May 6th starting at 8 UTC, and from 128MB to 64MB on Tue May 12th starting
at 8 UTC. We will be restarting webservice tools as part of the deployment
and no action is required on tool maintainers' part.
The details of this work can be found at
https://phabricator.wikimedia.org/T420565
best,
Filippo
--
*Filippo Giunchedi*
Staff Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>