Hi everyone,
Lately the k8s cluster has been reaching it's cpu allocation limits, and to
solve this we are changing a bit the default cpu limits that we were setting
for jobs.
This will be effective today. You might see for a short amount of time that the
reported cpu for your jobs is not what you expect, and that your continuous
jobs have been restarted to apply the new changes.
This will not have a degrading effect for the jobs, on the other side, it might
allow jobs that try to use more cpu to get it, and improve the overall cpu
resource usage in the cluster.
Here comes the kubernetes specific notes if anyone is interested :)
Until now, the default CPU limit and request was set to 500m, and if you were
requesting less than that, you would get the same request and limit that you
requested. If you were asking for more than than, the request would be set to
1/2 of what you asked for, and the limit to exactly what you asked for.
From now on, the default CPU limit will be 4000m, allowing your pods to use way
more than before by default if it's available, and the default request will be
100m (roughly the current cluster mean usage), this will help using better the
current resources in the cluster.
And if you specify a cpu value, it will be used as both the request and the
limit ensuring that your job is allocated to a node only if it has enough cpu.
More notes and details in the task https://phabricator.wikimedia.org/T404726
If we end up changing the memory limits, we will give some advance notice so
you can change your resources accordingly if needed.
Thanks!
--
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3
"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."
_______________________________________________
Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org
List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…
Hi everyone!
Next monday 8th of September at 08:00UTC we will start doing an upgrade to
the Toolforge kubernetes cluster[1].
The upgrade will progress during the day and will eventually restart all
running jobs (workers need to be rebooted, so jobs running on them need to be
moved to a different worker).
The restarted jobs should restart gracefully by themselves, and there's no
service downtime expected (most upgrades have been smooth so far), though
there's always a chance that something will go unexpectedly, so if you have
long-running reports or similar critical workloads, we recommend waiting until
the upgrade is over to run them.
I'll update replying to this message when the upgrade start and finishes.
Thanks!
[1] https://phabricator.wikimedia.org/T402378
--
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3
"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."
_______________________________________________
Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org
List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…
I'm trying to make a database dump for a tool on Toolforge using the
command from Help:Toolforge/ToolsDB
<https://wikitech.wikimedia.org/wiki/Help:Toolforge/ToolsDB#Backups>:
toolforge jobs run --command "umask o-r; ( mariadb-dump
--defaults-file=~/replica.my.cnf
--host=tools-readonly.db.svc.wikimedia.cloud s56581__declaration_journal >
~/declaration_journal-$(date -I).sql )" --image mariadb backup
The job fails and the error log shows:
mariadb-dump: Got error: 2026: "TLS/SSL error: Certificate verification
failure: The certificate is NOT trusted." when trying to connect
*Sebastian Berlin*
Utvecklare/*Developer*
Wikimedia Sverige (WMSE)
E-post/*E-Mail*: sebastian.berlin(a)wikimedia.se
Telefon/*Phone*: (+46) 0707 - 92 03 84
Hello everyone,
As part of the redesign of the moderation tool *LiveRC*, *Wikimédia
France* and its partner *OCTO Technology* are looking for:
*
one or more people who could share their experience using
*Toolforge* in their projects;
*
a *Toolforge mentor* who could guide us in discovering the platform:
explaining how it works, best practices, and answering our questions.
If you have some time and would like to participate, please feel free to
contact me via PM.
Being able to exchange in French would be a big plus, but any help is
more than welcome!
Thank you in advance 🙏
--
*Michaël BARBEREAU*
*/Administrateur systèmes et réseaux/*
*/+33 1 42 36 26 24/* <tel:+33142362624>
*/+33 7 84 37 91 03/* <tel:+33784379103>
*/-------------------------------------------------------------------------/*
*WIKIMEDIA FRANCE*
Association pour le libre partage de la connaissance
*/www.wikimedia.fr <http://www.wikimedia.fr/>/*
/28 rue de Londes, 75009 PARIS/
<https://www.openstreetmap.org/node/1234174880>