- Cloud-announce - lists.wikimedia.org

[wiki replicas] Expected replication lag on s8 (wikidatawiki)
by Francesco Negri 03 Dec '24

03 Dec '24

If you are using wiki replicas to query the "wikidatawiki" database (section s8), please be aware that we are expecting replication lag to grow up to 10 days, because of some ongoing maintenance work [0]. Only section s8 is affected, which contains only the "wikidatawiki" database. [1] Queries against other databases should not see any lag. This is going to impact tools using wiki replicas, as well as queries running on Quarry or PAWS. You can check the current replication lag at https://replag.toolforge.org/ Thanks for your patience while we complete this maintenance work. [0] https://phabricator.wikimedia.org/T367856 [1] https://noc.wikimedia.org/db.php#tabs-s8 -- Francesco Negri (he/him) -- IRC: dhinus Site Reliability Engineer, Cloud Services team Wikimedia Foundation

2 1

[Incident] Toolforge: Ongoing intermittent DNS resolution issues
by Slavina Stefanova 26 Nov '24

26 Nov '24

Hello, We are currently investigating widespread intermittent DNS resolution issues within the Toolforge Kubernetes cluster that began on Sunday. These issues are causing some jobs and deployments to fail, particularly on NFS worker nodes. Impact: - Some tools may experience failed deployments or crashes - Job execution may be inconsistent - Image pulls may fail intermittently Our team is actively investigating and working to resolve the issue. We will send an update once we have more information or when the incident is resolved. Thank you for your patience, Slavina, on behalf of the WMCS Team -- Slavina Stefanova (she/her) Software Engineer | Cloud Services Wikimedia Foundation

1 1

ToolsDB is getting upgraded to MariaDB 10.6
by Francesco Negri 25 Nov '24

25 Nov '24

Next Monday, November 25th, at around 13:00 UTC, ToolsDB will be upgraded from MariaDB v10.4.29 to MariaDB v10.6.19. [0] I have already created a new host "tools-db-4" that is running the new version, and is replicating from the current primary. On Monday, I will fail over the current primary to the new host. [1] All connections will be dropped and the DNS will be updated to point to the new host. Tools should automatically reconnect to the new host. No downtime is expected but there will be a few minutes of read-only time. For most tools, the upgrade should be painless. However, you might want to check the official docs listing incompatible changes introduced in version 10.5 [2] and 10.6. [3] If you find any issues, please let us know in the #wikimedia-cloud IRC channel. [0] https://phabricator.wikimedia.org/T352206 [1] https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/ToolsDB#Failing_… [2] https://mariadb.com/kb/en/upgrading-from-mariadb-10-4-to-mariadb-10-5/ [3] https://mariadb.com/kb/en/upgrading-from-mariadb-10-5-to-mariadb-10-6/ -- Francesco Negri (he/him) -- IRC: dhinus Site Reliability Engineer, Cloud Services team Wikimedia Foundation

1 2

Buster VMs to be deleted on Friday
by Andrew Bogott 13 Nov '24

13 Nov '24

On Friday I will delete the following VMs. All are running the long-deprecated Debian Buster OS and have been shut down for several months without user response or complaint. Please respond directly to me if you need any of the above to be preserved in some form. Context can be found at https://phabricator.wikimedia.org/T331738. centralnotice-staging: cn-staging-3.centralnotice-staging.eqiad1.wikimedia.cloud commons-corruption-checker: main.commons-corruption-checker.eqiad1.wikimedia.cloud deployment-prep: deployment-docker-proton01.deployment-prep.eqiad1.wikimedia.cloud deployment-echostore02.deployment-prep.eqiad1.wikimedia.cloud deployment-maps-master01.deployment-prep.eqiad1.wikimedia.cloud deployment-poolcounter06.deployment-prep.eqiad1.wikimedia.cloud deployment-restbase04.deployment-prep.eqiad1.wikimedia.cloud etytree: etytree-a.etytree.eqiad1.wikimedia.cloud mediawiki-vagrant: mwv-builder-03.mediawiki-vagrant.eqiad1.wikimedia.cloud schematreerecommender: recommender.schematreerecommender.eqiad1.wikimedia.cloud stress-tester.schematreerecommender.eqiad1.wikimedia.cloud wikicommunityhealth: backend.wikicommunityhealth.eqiad1.wikimedia.cloud frontend.wikicommunityhealth.eqiad1.wikimedia.cloud wikispore: wikispore-prod.wikispore.eqiad1.wikimedia.cloud

1 0

New Toolforge Standards Committee members have been seated
by Bryan Davis 04 Nov '24

04 Nov '24

Following a public nomination process[0], vetting by various groups[1], and signing of non-disclosure agreements with the Wikimedia Foundation[2] the following folks have been seated as the newest incarnation of the Toolforge Standards Committee[3]: * JJMC89 * Pintoch * Lucas Werkmeister * Legoktm * Waldyrious * SD0001 * MusikAnimal * Theprotonade These folks are now empowered to help resolve requests related to the Right to Fork policy[4] and the Abandoned tool policy[5]. They may also choose as a group to promote other standards and practices within Toolforge that are intended to help us all make and support better tools for the community. [0]: https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.… [1]: https://phabricator.wikimedia.org/T370474 [2]: https://phabricator.wikimedia.org/T374993 [3]: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Toolforge_standards_comm… [4]: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Right_to_fork_policy [5]: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Abandoned_tool_policy Bryan -- Bryan Davis Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808

1 0

Toolforge Kubernetes upgrade to 1.28
by Raymond Ndibe 04 Nov '24

04 Nov '24

Hello everyone!, We will be upgrading Toolforge Kubernetes to v1.28 on Monday Oct 28 around 1:00PM UTC. We do not expect any downtime, but some jobs and webservices may restart as they get shuffled around to different worker nodes. Please report any issues you encounter.[0] For details see: https://phabricator.wikimedia.org/T362867 [0] https://wikitech.wikimedia.org/wiki/Portal:Toolforge#Communication_and_supp… Cheers, -- Ndibe Raymond Olisaemeka Software Engineer II - Technical Engagement Wikimedia Foundation <https://wikimediafoundation.org/>

1 3

Temporary accounts are coming, check if your tool may be affected
by Francesco Negri 30 Oct '24

30 Oct '24

If your tool does not read user information from Wiki Replicas, feel free to ignore this email. Temporary accounts [0] are starting to be rolled out, and since yesterday they are enabled on a few smaller wikis: Czech Wikiversity, Igbo Wikipedia, Italian Wikiquote, Swahili Wikipedia, and Serbo-Croatian Wikipedia. [1] Temporary Accounts modify the ways MediaWiki stores anonymous users in database tables. If you manage a tool that reads user information for anonymous users, you should check the page "How should I update my code?" [2] to find out if you need to make changes to your tool. You can use the wikis listed above to test that your tool is working correctly. If you have questions or if you want to report an issue, you can file a Phabricator task as a subtask of [3]. [0] https://www.mediawiki.org/wiki/Trust_and_Safety_Product/Temporary_Accounts [1] https://www.mediawiki.org/wiki/Trust_and_Safety_Product/Temporary_Accounts/… [2] https://www.mediawiki.org/wiki/Trust_and_Safety_Product/Temporary_Accounts/… [3] https://phabricator.wikimedia.org/T378516 -- Francesco Negri (he/him) -- IRC: dhinus Site Reliability Engineer, Cloud Services team Wikimedia Foundation

1 0

[Toolforge maintenance 2024-10-08] Tomorrow at 12:00 UTC toolforge upgrade
by David Caro 08 Oct '24

08 Oct '24

Hi! We will be upgrading toolforge build service tomorrow October 8th at 12:00 UTC. There's no downtime expected, but the process this time is a bit flaky, and if something goes wrong, it might make any new builds to fail, or prevent you from accessing the logs of old ones until manually fixed. I'll notify replying to this email and on irc before and after the upgrade. Thanks! -- David Caro SRE - Cloud Services Wikimedia Foundation <https://wikimediafoundation.org/> PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3 "Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment."

1 2

Multi-Replica Support For Jobs-Framework Continuous Jobs
by Raymond Ndibe 23 Sep '24

23 Sep '24

Hello everyone! We are happy to announce that toolforge jobs framework now supports multiple replicas for continuous jobs! There are times when you might need to run multiple instances of the same thing (e.g. multiple processes). This change allows you to do that using the `--replicas` option. An example command would be something like `toolforge jobs run multi-replica-con-job --command ./command.sh --image bookworm --continuous --replicas 5` The log output from each of the running instances will be aggregated and can be streamed (if you are using the --no-filelog option) or written to the log files. Note: This can only be configured for continuous jobs. Note: There is no limit to the number of replicas you can specify, but running too many replicas can exceed the resource quota assigned to each job/tool by default. If this happens the job may fail to run, with out-of-quota error message displayed. Also, a reminder that you can find this and smaller user-facing updates about the Toolforge platform features here: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Changelog Original task: https://phabricator.wikimedia.org/T341066 -- Ndibe Raymond Olisaemeka Software Engineer II - Technical Engagement Wikimedia Foundation <https://wikimediafoundation.org/>

1 0

Re: Toolforge kubernetes upgrade to 1.27
by Raymond Ndibe 16 Sep '24

16 Sep '24

Hello everyone, We will be starting the Toolforge Kubernetes upgrade to v1.27 in a few minutes. Like we mentioned earlier, we do not expect any downtime but do let us know if you notice any weird behaviour that'd require our attention. -- Ndibe Raymond Olisaemeka Software Engineer II - Technical Engagement Wikimedia Foundation <https://wikimediafoundation.org/> On Tue, Sep 10, 2024 at 6:43 PM Raymond Ndibe <rndibe(a)wikimedia.org> wrote: > > Hello everyone!, > > We will be upgrading Toolforge Kubernetes to v1.27 on Monday Sep 16 > around 1:00PM UTC. > > We do not expect any downtime, but some jobs and webservices may > restart as they get shuffled around to different worker nodes. Please > report any issues you encounter.[0] > > For details see: https://phabricator.wikimedia.org/T359641 > > [0] https://wikitech.wikimedia.org/wiki/Portal:Toolforge#Communication_and_supp… > > Cheers, > -- > Ndibe Raymond Olisaemeka > Software Engineer II - Technical Engagement > > Wikimedia Foundation <https://wikimediafoundation.org/>

1 0

2024

2023

2022

2021

2020

2019

2018

2017

Cloud-announce