March 2023 - Cloud - lists.wikimedia.org

[Cloud-announce] Toolforge Kubernetes upgrade on 2023-04-03
by Taavi Väänänen 10 Apr '23

10 Apr '23

Hi, We will be upgrading the Toolforge Kubernetes cluster next Monday (2023-04-03) starting at around 10:00 UTC. The expected impact is that tools running on the Kubernetes cluster will get restarted a couple of times over the course of the few hours it takes for us to upgrade the entire cluster. The ability to manage tools will remain operational. Since the version we're upgrading to (1.22) removes a bunch of deprecated Kubernetes APIs, tools that use kubectl and raw Kubernetes resources directly may want to check that they're on the latest available versions. The vast majority of tools that are only using the Jobs framework and/or the webservice command are not affected by these changes. Taavi _______________________________________________ Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…

4 4

[Cloud-announce] Two toolforge outages coming next week, Monday and Thursday
by Andrew Bogott 03 Apr '23

03 Apr '23

There will be two major Toolforge outages this coming week. Each outage will cause tool downtime and may require manual restarts afterwards. The first outage is an NFS migration [0] and will take place on Monday, beginning at around 0:00 UTC and lasting well into the day, possibly as late as 19:00 UTC. During this long period, Toolforge NFS will be read-only. This will cause most tools (for example, anything that writes a log file) to fail. The second outage will be a database migration [1] and will take place on Thursday at 17:00UTC. During this window ToolsDBwill be read-only. This migration should take about an hour but unexpected side-effects may extend the downtime. We try very hard to avoid outages of this magnitude, but at this point we need to choose downtime over the increasing risk of data loss. More details can be found below. [0] NFS Outage and system reboots Monday: The existing toolforge NFS server is running on aging hardware and lacks a straightforward path for maintenance or upgrading. To improve this we are moving NFS to a cinder+VM platform which should support easier upgrades, migrations, and expansions in the future. In order to maintain data integrity during the migration, the old server will need to be made read-only while the last set of file changes is synchronized with the new server. Because the NFS service is actively used, it will take many hours to complete the final sync. To ensure stable mounts of the new server, every node in Toolforge will be rebooted as part of this migration. That means that even tools which do not use NFS will be affected, although most tools should restart gracefully. This task is documented as https://phabricator.wikimedia.org/T333477. [1] DB outage Thursday: As part of the ongoing effortto upgrade user-created Toolforge databases, we willmigrate ToolsDB to a new VM that will have a more recent version of Debian and MariaDB and will use a more resilient storage solution. The new VM is ready, and we plan to point all tools to use it on *Apr, 6 2023 at 17:00 UTC*. This will involve about *1 hour of read-only time*for the database. Any existing database connection will be terminated, and if your tool does not reconnect automatically you might have to restart it manually. An email will be sent shortly before starting the migration, and when it's finished. Please also make sure your tool is connecting to the database using the canonical hostname *tools.db.svc.wikimedia.cloud*and not any other hostname or IP address. For more details, and to report any issue, you can read or leave a comment at https://phabricator.wikimedia.org/T333471 For more context you can also check out the parent task https://phabricator.wikimedia.org/T301949 _______________________________________________ Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…

2 3

[Cloud-announce] partial wmcs outage tomorrow, 2022-03-28 between 14:00 and 16:00 UTC
by Andrew Bogott 28 Mar '23

28 Mar '23

Due to unavoidable network switch maintenance[0], some WMCS services will be offline briefly tomorrow. The downtime will last for 20-30 minutes and take place sometime between 14:00 and 16:00 UTC. Here is what to expect during the downtime: * *Toolsdb will be unavailable and all queries will fail* * Some of the wiki replica databases may be unavailable * Some DNS servers will be offline; some services may fail to resolve hosts, depending on their fallback logic We anticipate a graceful recovery from this outage, but NFS is fickle so we may need to reboot some or all VMs after the outage. Sorry in advance for any inconvenience or upset emails that result from this maintenance. - Andrew + the WMCS team [0] https://phabricator.wikimedia.org/T330165 _______________________________________________ Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…

1 1

Redis vulnerabilities
by Roy Smith 27 Mar '23

27 Mar '23

There's a couple of new vulnerabilities for redis that were just published, which people should be aware of: https://nvd.nist.gov/vuln/detail/CVE-2023-28858 <https://nvd.nist.gov/vuln/detail/CVE-2023-28858> https://nvd.nist.gov/vuln/detail/CVE-2023-28859 <https://nvd.nist.gov/vuln/detail/CVE-2023-28859> I just heard about these, so don't know anything more than what I'm reading in the NIST reports.

1 0

[Cloud-announce] PAWS k8s upgrade 2023-03-20
by Vivian Rook 17 Mar '23

17 Mar '23

PAWS will be switching k8s clusters to get to the latest k8s that openstack currently supports (1.23). This should occur on 2023-03-20 around 13:00 UTC. Anything that was running at the time on the current (old) cluster will need restarted. https://phabricator.wikimedia.org/T328489 -- *Vivian Rook (They/Them)* Site Reliability Engineer Wikimedia Foundation <https://wikimediafoundation.org/> _______________________________________________ Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…

1 0

[Cloud-announce] PAWS nfs cutover 2023-03-14 12:00 UTC
by Vivian Rook 14 Mar '23

14 Mar '23

From 12:00 to 15:00 UTC on 2023-03-14 PAWS is cutting over its nfs storage. As a result anything saved during this time frame will likely be lost. Please do not save any files (or have bots save any files) during this time as they likely will not make it through the cutover. We'll send out an email to cloud-announce noting when the cutover is done and it is safe to save files again. More information can be found in the following tickets: https://phabricator.wikimedia.org/T331056 https://phabricator.wikimedia.org/T303663 https://phabricator.wikimedia.org/T301280 Thank you, -- *Vivian Rook (They/Them)* Site Reliability Engineer Wikimedia Foundation <https://wikimediafoundation.org/> _______________________________________________ Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…

2 2

Wikimedia Hackathon in Athens: registration, program & satellite events
by Srishti Sethi 14 Mar '23

14 Mar '23

Hello all, As you may already know, the Wikimedia Hackathon 2023 will take place in Technopolis, Athens, Greece, on May 19-21. This in-person event will gather the global Wikimedia technical community to connect, hack, run technical discussions, and explore new ideas. *If you plan to join, please register as soon as possible.* Registration for the Hackathon will be open until we reach the event's maximum capacity [0]. If you plan to join, we encourage you to register as soon as possible, as there are currently around ten slots left. We also encourage you to book your travel and accommodation shortly. You will find more information on the travel [1] and accommodation [2] pages. You are also welcome to help us improve those pages by adding additional information about Athens and how to travel there. For people who already registered, remember to confirm your attendance by filling out the additional field in the registration form (see the email sent by hackathon(a)wikimedia.org on March 13). Please note that WMF's scholarship process concluded in January, and aside from the 51 people who got a scholarship confirmed, we cannot support people with funds or visa documents. Therefore, most participants must organize their own travel to stay in Athens. This year's edition will focus on bringing together people who already contribute to technical aspects of the Wikimedia projects, know how to find their way around in the technical ecosystem, and can work or collaborate on projects more autonomously. For people new to our technical environment, there are other newcomers-friendly events you can join throughout the year - feel free to improve the list. [3] *You can organize a satellite event with your local community.* For people new to the technical aspects of the Wikimedia projects or who cannot attend the in-person event, a great option could be to organize an autonomous, local satellite event to the Hackathon. These events can occur before, during, or after the in-person event. If you are considering running an event like this, contact your local community to get started! If you need financial support, please note that the deadline to apply for the current round of Rapid Funds is March 20 [4]. *Proposals for the in-person program are welcome until April 4.* The Hackathon is organized as a participant-driven event and lives from the active participation of its attendees. Because this year's edition is focused on reconnecting with your technical community peers in person, most of the program will take place onsite. You can propose a session by creating a task on the Phabricator board. Find more information on the Program page [5]. Similar to the satellite events, some online sessions, in parallel or ahead of the event, may be organized autonomously by participants. In any case, participants are free to work on their projects, whether in-person or online, and connect to the technical community on the Hackathon channels [6]. If you have any questions, feel free to use the Hackathon talk page or to reach out to the organizers at hackathon(a)wikimedia.org. Cheers, Srishti On behalf of the Hackathon organizing team [0] https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2023/Participate#Registr… [1] https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2023/Travel [2] https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2023/Accommodation [3] https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2023/Documentation#Event… [4] https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2023/Satellite_events [5] https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2023/Program [6] https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2023/Connect *Srishti Sethi* Senior Developer Advocate Wikimedia Foundation <https://wikimediafoundation.org/>

1 0

[Cloud-announce] Cloud VPS users, please claim your projects
by Andrew Bogott 11 Mar '23

11 Mar '23

Hello cloud-vps users! It's time for our annual cleanup of unused projects and resources. Every year or so the Cloud Services team tries to identify and clean up unused projects and VMs. We do this via an opt-in process: anyone can mark a project as 'in use,' and that project will be preserved for another year. I've created a wiki page that lists all existing projects, here: https://wikitech.wikimedia.org/wiki/News/Cloud_VPS_2022_Purge If you are a VPS user, please visit that page and mark any projects that you use as {{Used}}. Note that it's not necessary for you to be a project admin to mark something -- if you know that you're currently using a resource and want to keep using it, go ahead and mark it accordingly. If you /are/ a project admin, please take a moment to mark which VMs are or aren't used in your projects. When February arrives, I will shut down and begin the process of reclaiming resources from unused projects. If you think you use a VPS project but aren't sure which, I encourage you to poke around on https://tools.wmflabs.org/openstack-browser/ to see what looks familiar. Worst case, just email cloud(a)lists.wikimedia.org with a description of your use case and we'll sort it out there. Exclusive toolforge users are free to ignore this email. Thank you! -Andrew and the WMCS team _______________________________________________ Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…

3 5

[Cloud-announce] role renames in cloud-vps
by Andrew Bogott 09 Mar '23

09 Mar '23

I am in the process of standardizing[0] the role names in WMCS cloud-vps to conform with upstream conventions[1]. That requires me to rename two existing user roles, 'user' and 'projectadmin': - The role previously called 'user' will now be called 'reader' - The role previously called 'projectadmin' will now be called 'member' Despite the (IMO) less obvious names, a 'reader' can still log into project VMs, and a 'member' can still create and delete VMs. Taavi has thoughtfully upgraded the documentation about what roles can do what; the complete docs can be found at https://wikitech.wikimedia.org/wiki/Help:Cloud_services_user_roles_and_righ… This renaming is phase one; phase two will involve switching to the default upstream access rules for these two new roles. Right now the old and new roles are co-existing in our system, but soon I will entirely delete the old 'user' and 'projectadmin' roles. In the meantime, please let me know if you find stray references to the old role names, or if you find yourself unable to perform Horizon actions[1] that you were previously able to do. Or, more seriously, able to do things that you were not previously able to do! Sorry for any inconvenience caused! -Andrew [0] Our OpenStack deployment has a very long history; it is older than most deployments. That means that many conventions established in our cloud now differ from the consensus standards created among newer clouds. Periodically I try to update our cloud to conform to these new standards; it reduces tech debt and also increases the chances that official OpenStack documentation will be useful to our users. [1] https://phabricator.wikimedia.org/T330759 [2] There is one edge case in Horizon that may require you to switch projects in order to refresh the role permissions. _______________________________________________ Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…

1 0

Toolforge: brief network maintenance today 2023-03-06
by Arturo Borrero Gonzalez 06 Mar '23

06 Mar '23

Hi there! Today 2023-03-06, in a few minutes, we will restart the Toolforge internal network, A brief interruption of network communications is expected during the maintenance. This is because we're re-deploying calico to the kubernetes cluster [0]. No action required on your side. regards. [0] https://phabricator.wikimedia.org/T328539 -- Arturo Borrero Gonzalez Senior SRE / Wikimedia Cloud Services Wikimedia Foundation

1 0

2024

2023

2022

2021

2020

2019

2018

2017

Cloud March 2023