On Mon, Jan 10, 2022 at 4:23 PM Roy Smith roy@panix.com wrote:
I'm starting on a project which will process every edit comment in the enwiki dumps using k8s jobs on toolforge. There's 736 enwiki-20211201-pages-meta-history*.xml*bz2 files. Would kicking off 736 jobs (one per file) be a reasonable thing to do from the resource consumption point of view, or would that lead to a WTF response from the sysadmins?
The Toolforge Kubernetes cluster tries to protect itself from runaway resource consumption via quota limits. You can see the quotas for your tool by running `kubectl describe quota` from a bastion. The default concurrent jobs quota is 15, but you are probably going to run out of other quota limited resources (cpu, memory, pods) long before you get to the point of 15 concurrent jobs.
To your specific question, yes if you managed to spawn 736 simultaneous dump processing jobs it would almost certainly lead to resource starvation across the entirety of the Toolforge Kubernetes cluster and make many sad SREs.
Bryan