Hi everyone,
I'm seeing strange things with resource consumption with toolforge. My resource consumption reported by the quota is way above the consumption of my currently running job, preventing other jobs from starting. Where does the extra consumption come from ? And how to clean this ?
More detailed data below
Any help appreciated, NicoV
My tool account is wpcleaner on toolforge.
Only one job is currently running :
tools.wpcleaner@tools-bastion-15:~$ toolforge jobs list +------------------------+--------------------+------------------------------------------+ | Job name: | Job type: | Status: | +------------------------+--------------------+------------------------------------------+ | wpcleaner-cs-weekly | schedule: @weekly | Last schedule time: 2025-12-03T05:17:00Z | | wpcleaner-en-list | schedule: @weekly | Unable to start, out of quota for memory | | wpcleaner-fr-dab | schedule: @weekly | Unable to start, out of quota for memory | | wpcleaner-fr-daily | schedule: @daily | Last schedule time: 2025-12-07T03:58:00Z | | wpcleaner-fr-list | schedule: @weekly | Running for 2d4h10m | | wpcleaner-fr-weekly | schedule: @weekly | Last schedule time: 2025-12-04T05:03:00Z | | wpcleaner-meta-list | schedule: @monthly | Last schedule time: 2025-12-04T04:25:00Z | | wpcleaner-meta-monthly | schedule: @monthly | Last schedule time: 2025-11-18T06:05:00Z | +------------------------+--------------------+------------------------------------------+
The currently running job defines 3G and 1 CPU for resources :
tools.wpcleaner@tools-bastion-15:~$ toolforge jobs show wpcleaner-fr-list +---------------+-----------------------------------------------------------------+ | Job name: | wpcleaner-fr-list | +---------------+-----------------------------------------------------------------+ | Command: | /data/project/wpcleaner/tools/scripts/fr_ListCheckWiki_List.sh | +---------------+-----------------------------------------------------------------+ | Job type: | schedule: @weekly | +---------------+-----------------------------------------------------------------+ | Image: | jdk17 | +---------------+-----------------------------------------------------------------+ | Port: | none | +---------------+-----------------------------------------------------------------+ | File log: | no | +---------------+-----------------------------------------------------------------+ | Output log: | | +---------------+-----------------------------------------------------------------+ | Error log: | | +---------------+-----------------------------------------------------------------+ | Emails: | onfailure | +---------------+-----------------------------------------------------------------+ | Resources: | mem: 3.0Gi, cpu: 1.0 | +---------------+-----------------------------------------------------------------+ | Replicas: | | +---------------+-----------------------------------------------------------------+ | Mounts: | none | +---------------+-----------------------------------------------------------------+ | Retry: | no | +---------------+-----------------------------------------------------------------+ | Timeout: | no | +---------------+-----------------------------------------------------------------+ | Health check: | none | +---------------+-----------------------------------------------------------------+ | Status: | Running for 2d4h11m | +---------------+-----------------------------------------------------------------+ | Hints: | Last run at 2025-12-03T20:36:14Z. Pod in 'Running' phase. State | | | 'running'. Started at '2025-12-03T20:36:15Z'. | +---------------+-----------------------------------------------------------------+
But the quota command says I'm consuming 2.5 CPU and 6.5G, which is a lot more than what the only running job defines.
tools.wpcleaner@tools-bastion-15:~$ toolforge jobs quota Running jobs Used Limit -------------------------------------------- ------ ------- Total running jobs at once (Kubernetes pods) 3 16 Running one-off and cron jobs 4 15 CPU 2.5 16.0 Memory 6.5Gi 8.0Gi
Per-job limits Used Limit ---------------- ------ ------- CPU 3.0 Memory 6.0Gi
Job definitions Used Limit ---------------------------------------- ------ ------- Cron jobs 8 50 Continuous jobs (including web services) 1 16
Hi Nicolas,
Le dim. 7 déc. 2025 à 15:30, Nicolas Vervelle nvervelle@gmail.com a écrit :
My resource consumption reported by the quota is way above the consumption of my currently running job, preventing other jobs from starting. Where does the extra consumption come from ? And how to clean this ?
It seems you currently have 3 jobs running: - 2 instances of the wpcleaner-fr-list cronjob that requests 2 × 3 GiB of memory, - 1 instance of webservice which I guess requests the default 0,5 GiB of memory.
Hence a total of 6,5 GiB requested (but not actually consumed), which means that you cannot request an additional 3 GiB for another job (the quota for requests being 8 GiB).
My suspicions stem for the Grafana monitoring, where three pods are visible, including two for wpcleaner-fr-list:
https://grafana.wmcloud.org/d/TJuKfnt4z/tool-dashboard?orgId=1&var-names...
Best regards,
Thanks Jérémie,
I ran toolforge jobs flush, which stopped all jobs apparently, and removed everything from my list of jobs. I ran toolforge jobs load to reload my list of jobs.
So it seems things are back to normal now, but I have a few questions : * How is it possible that there was 2 instances of the same cron job ? Is it because one was stuck more than a week and a second one was started at this time ? * How can I really stop a job (I see only toolforge commands for restarting) when it gets stuck ? * How can I stop jobs when there are several instances of the same cron job ?
Nicolas
On Sun, Dec 7, 2025 at 3:50 PM Jérémie Roquet jroquet@arkanosis.net wrote:
Hi Nicolas,
Le dim. 7 déc. 2025 à 15:30, Nicolas Vervelle nvervelle@gmail.com a écrit :
My resource consumption reported by the quota is way above the
consumption of my currently running job, preventing other jobs from starting. Where does the extra consumption come from ? And how to clean this ?
It seems you currently have 3 jobs running:
- 2 instances of the wpcleaner-fr-list cronjob that requests 2 × 3
GiB of memory,
- 1 instance of webservice which I guess requests the default 0,5 GiB
of memory.
Hence a total of 6,5 GiB requested (but not actually consumed), which means that you cannot request an additional 3 GiB for another job (the quota for requests being 8 GiB).
My suspicions stem for the Grafana monitoring, where three pods are visible, including two for wpcleaner-fr-list:
https://grafana.wmcloud.org/d/TJuKfnt4z/tool-dashboard?orgId=1&var-names...
Best regards,
-- Jérémie
Replying to myself for stopping jobs : probably use kubectl commands instead of toolforge jobs commands, as you have more options.
On Sun, Dec 7, 2025 at 5:04 PM Nicolas Vervelle nvervelle@gmail.com wrote:
Thanks Jérémie,
I ran toolforge jobs flush, which stopped all jobs apparently, and removed everything from my list of jobs. I ran toolforge jobs load to reload my list of jobs.
So it seems things are back to normal now, but I have a few questions :
- How is it possible that there was 2 instances of the same cron job ? Is
it because one was stuck more than a week and a second one was started at this time ?
- How can I really stop a job (I see only toolforge commands for
restarting) when it gets stuck ?
- How can I stop jobs when there are several instances of the same cron
job ?
Nicolas
On Sun, Dec 7, 2025 at 3:50 PM Jérémie Roquet jroquet@arkanosis.net wrote:
Hi Nicolas,
Le dim. 7 déc. 2025 à 15:30, Nicolas Vervelle nvervelle@gmail.com a écrit :
My resource consumption reported by the quota is way above the
consumption of my currently running job, preventing other jobs from starting. Where does the extra consumption come from ? And how to clean this ?
It seems you currently have 3 jobs running:
- 2 instances of the wpcleaner-fr-list cronjob that requests 2 × 3
GiB of memory,
- 1 instance of webservice which I guess requests the default 0,5 GiB
of memory.
Hence a total of 6,5 GiB requested (but not actually consumed), which means that you cannot request an additional 3 GiB for another job (the quota for requests being 8 GiB).
My suspicions stem for the Grafana monitoring, where three pods are visible, including two for wpcleaner-fr-list:
https://grafana.wmcloud.org/d/TJuKfnt4z/tool-dashboard?orgId=1&var-names...
Best regards,
-- Jérémie