October 2022 - Cloud - lists.wikimedia.org

Outreachy Round 25–projects finalized & contribution period begins!
by Srishti Sethi 23 Nov '22

23 Nov '22

Hello all, Here is a quick update on Outreachy Round 25: we recently concluded the final call for projects and mentors and are now promoting 6 projects led by 14 mentors. If you know someone who cleared the Outreachy's initial eligibility check, encourage them to explore the Wikimedia projects below: - Create a web application for editing Toolhub records, mentored by: Slavina Stefanova, Damilare Adedoyin, Roy Smith - Develop features for Wiki Loves Monuments app, mentored by: Ederporto, Mike Peel - Rewrite Imagebulk tool to scale up, mentored by: Jay Prakash, Sudhanshu Gautam - Write a Ruby gem for analyzing Wikidata edits, mentored by: Sage Ross, Will Kent - Develop a web app for patrolling based on the new ML-based service to predict reverts, mentored by: Diego Saez-Trumper, Muniza A. - Hybrid event production for QueeringWikipedia 2023, mentored by: Z. Blace, Owen Blacker, Freddy eduardo If you are interested in any of these projects, you can either subscribe to the related Phabricator tickets or share your ideas and suggestions in a comment. Learn more here: <https://www.mediawiki.org/wiki/Outreachy/Round_25> [1] Cheers, Srishti [1] https://www.mediawiki.org/wiki/Outreachy/Round_25 *Srishti Sethi* Senior Developer Advocate Wikimedia Foundation <https://wikimediafoundation.org/>

1 1

[Small wiki toolkits] Final feedback session on Friday, October 28th, at 16:00 UTC
by Srishti Sethi 08 Nov '22

08 Nov '22

Hello everyone, The last & final feedback session on the "Small wiki toolkits" (SWT) workshop series is coming up - it will take place on Friday, October 28th, at 16:00 UTC. You can find more details on the workshop and a link to join here: < https://meta.wikimedia.org/wiki/Small_wiki_toolkits/Workshops#Upcoming:_Fin…> [1]. This workshop will gather feedback on the SWT workshop series around bots and scripts development, ongoing since January 2022. There will be a discussion around the following: - Overall feedback on the workshop series - Technical topics you would like to see the SWT team focus on by running workshops or developing resources in 2023 - Your preferred learning formats This session does not require attendance in previous workshops to participate. We look forward to your participation! Best, Srishti On behalf of the SWT Workshops Organization team [1] https://meta.wikimedia.org/wiki/Small_wiki_toolkits/Workshops#Upcoming:_Fin… *Srishti Sethi* Senior Developer Advocate Wikimedia Foundation <https://wikimediafoundation.org/>

2 3

[Cloud-announce] [Toolforge] Self-service tool deletion finally arrives!
by Bryan Davis 01 Nov '22

01 Nov '22

TL;DR: * https://toolsadmin.wikimedia.org now allows marking a tool as "disabled". * Disabling a tool will immediately stop any running jobs including webservices and prevent maintainers from logging in as the tool. * Disabled tools are archived and deleted after 40 days. * Disabled tools can be re-enabled at any time prior to being archived and deleted. "How can I delete a tool that I no longer want?" is a question that folks have been asking for a very long time. I know of Phabricator tasks going back to at least April 2016 [0] tracking such requests. A bit over 5 years ago I created a Phabricator task to track figuring out how to delete an unused tool [1]. Nearly 18 months ago Andrew Bogott started to look into how we could automate the checklist of cleanup steps that had been developed. By January 2022 Andrew had implemented all of the pieces needed complete the checklist. This came with a command line tool that Toolforge admins have been able to use to delete a tool. Today we have released updates to Striker (<https://toolsadmin.wikimedia.org>) which finally expose a "disable tool" button to a tool's maintainers [2]. When a tool is marked as disabled any running jobs it has on the Grid Engine or Kubernetes backends are stopped. Changes are also made so that new jobs cannot be started, any crontab file is archived, and maintainers are prevented from using `become <tool>`. Normally things stay in this state for 40 days to give everyone a chance to change their minds and re-enable to tool. Once the 40 day timer expires, the system will proceed with cleanup tasks that are more difficult to reverse including archiving and deleting the tool's $HOME and ToolsDB databases. Ultimately the tool's group and user are deleted from the LDAP directory which functionally completes the process. A lot of system administration tasks are kind of boring, but this work turned out to be actually pretty interesting. A Toolforge tool can include quite a number of different parts. There can be jobs running on the Grid Engine and/or Kubernetes, a crontab to start jobs periodically, a database in ToolsDB, credentials for accessing the Wiki Replicas, credentials for accessing the Toolforge Elasticsearch cluster, a $HOME directory on the Toolforge NFS server, and account information in the LDAP directory that powers Developer accounts and Cloud VPS credentials. All of these things would ideally be removed when a tool was successfully deleted. Some of them are things that we would like to create historical archives of incase someone wanted to recreate the tool's functionality. And in a perfect world we would also be able to change our minds and start the tool back up if things had not progressed to fully deleting the tool. Andrew came up with a fairly elegant system to deal with this complexity. He designed a series of processes which are each responsible for a slice of the overall complexity. A process running on the Grid controller is responsible for stopping running Grid Engine jobs and changing the tool's quota so that no new jobs can be started. A process running on the Crontab server archives the tool's crontab configuration. A process running on the Kubernetes controller deletes the tool's credentials for accessing the Kubernetes cluster, the tool's namespace, and by extension removes all processes running in the namespace. A process running on the NFS controller archives the tool's $HOME directory contents and deletes the directory. It also removes the tool from other LDAP membership lists (a tool can be a co-maintainer of another tool) and deletes the tool's user and group from the LDAP directory. A process archives ToolsDB tables. Another process removes the tool's database credentials across the ToolsDB and Wiki Replicas server pools. Many of these processes are implemented in cloud/toolforge/disable-tool on Gerrit [3]. Others were added to existing management controllers for creating Kubernetes and database credentials. The processes all take cues from the LDAP directory and tracking files in the tool's $HOME to create an eventually consistent, decoupled collection of clean up actions. We still have some work to do to update documentation on wikitech and Phabricator so that folks know where to find the new buttons. If you find documentation that needs to be updated before someone else gets to it, please feel empowered to be [[WP:BOLD]] and update them. [0]: https://phabricator.wikimedia.org/T133777 [1]: https://phabricator.wikimedia.org/T170355 [2]: https://phabricator.wikimedia.org/T285403 [3]: https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/toolforge/disable-tool/ [[WP:BOLD]]: https://en.wikipedia.org/wiki/Wikipedia:Be_bold Bryan, on behalf of the Toolforge administration team -- Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808 _______________________________________________ Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…

1 0

[Cloud-announce] toolforge disk space cleanup: old .log and .err files to be deleted
by Andrew Bogott 31 Oct '22

31 Oct '22

The shared NFS servers that back toolforge have been running close to full for a while. We are going to free up space by taking the following steps: - Remove all files ending with .log and .err that have not been modified since November 1st, 2021 (e.g. find -name '*.log' -not -newermt "Nov 1, 2021" -exec rm {} \;) - Truncate all files ending with .log and .err to a total size of 1GB. (e.g find -name '*.log' -size +1G-exec truncate --size=1G {} \;) We'll be running those commands on Friday of this week. If you have any log or err files of that form that need to NOT be truncated and/or deleted, rename them now! Also, please take moment to run 'du' in your home and tool dirs and delete any other files that you can live without. Thank you! -Andrew _______________________________________________ Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…

1 0

[Cloud-announce] VM reboots coming on Monday (bullseye only)
by Andrew Bogott 24 Oct '22

24 Oct '22

As part of routine security maintenance, all Debian Bullseye VMs are due for a reboot and kernel upgrade. I will be performing these reboots early next week, either on Monday or Tuesday. If you want to reboot hosts on your own time (rather than at a random Andrew-selected time), feel free to reboot your own hosts before then. -Andrew + the WMCS team _______________________________________________ Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…

1 2

Need ideas for debugging wedged web service (toolforge)
by Roy Smith 21 Oct '22

21 Oct '22

spi-tools and spi-tools-dev both occasionally get wedged. HTTP requests just hang and eventually time out with a 50x. Nothing gets logged in either my django application log, or in uwsgi.log. If I restart the service, things are fine until it happens again. Any ideas how I can get better visibility into what's happening? Can I make uwsgi do more verbose logging? Is there any way I can see the request progress through higher levels of the stack (nginx, etc) so I know where things go wrong?

1 0

[Cloud-announce] Eliminating Debian Stretch in Cloud VPS
by Andrew Bogott 13 Oct '22

13 Oct '22

Debian Stretch's security support ends in mid 2022, and the Foundation's OS policy already discourages use of existing Stretch machines. That means that it's time for all project admins to start rebuilding your VMs with Bullseye (or, if you must, Buster.) Any webservices running in Kubernetes created in the last year or two are most likely using Buster images already, so there's no action needed for those. Older kubernetes jobs should be refreshed to use more modern images whenever possible. If you are still using the grid engine for webservices, we strongly encourage you to migrate your jobs to Kubernetes. For other grid uses, watch this space for future announcements about grid engine migration; we don't yet have a solution prepared for that. Details about the what and why for this process can be found here: https://wikitech.wikimedia.org/wiki/News/Stretch_deprecation Here is the deprecation timeline: March 2021: Stretch VM creation disabled in most projects July 6, 2021: Active support of Stretch ends, Stretch moves into LTS <- You are Here -> January 1st, 2022: Stretch VM creation disabled in all projects, deprecation nagging begins in earnest. Stretch alternatives will be available for tool migration in Toolforge May 1, 2022: All active Stretch VMs will be shut down (but not deleted) by WMCS admins. This includes Toolforge grid exec nodes. June 30, 2022: LTS support for Debian Stretch ends, all Stretch VMs will be deleted by WMCS admins _______________________________________________ Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…

1 1

Migrating Tools Off Grid Engine And The Way Forward
by Seyram Komla Sapaty 13 Oct '22

13 Oct '22

Hello! Earlier this year, WMCS initiated the process to migrate tools off the grid[0]. We also published a series of blog posts explaining further the reasoning behind this action[1] We encouraged maintainers to move to Kubernetes if they could but also made available Debian Buster GridEngine for those tools who were blocked or otherwise unable to migrate to Kubernetes at that time. We are aware that not all workloads can easily move from the grid to Kubernetes.[2] For some of the current grid workflows, there may be no 1:1 functionality match on Kubernetes. Work is underway to address most of these issues[3] We’re putting together a use case continuity table showing GridEngine workloads and their equivalent Kubernetes workloads[4] [image: case continuity.PNG] To help track the specific migration work, we created a Phabricator ticket(project tag: grid-engine-to-k8s-migration[5]) for each tool that is currently running on GridEngine. With a ticket for each tool on GridEngine, we hope to collect specific blocking issues and have the team work on addressing them. We encourage maintainers to reach out if you need help or find you are blocked by missing features. We noticed, after receiving notifications for these tickets, some of you wondered whether the grid is being shut down immediately. This is not the case. We will work with tool maintainers to ensure all tools safely move off the grid(or are safely shutdown), only then will we start looking at decommissioning the grid. Apologies to those who felt spammed by the ticket creation process and got worried about the future of their projects. We should have communicated better around this process. === Way Forward === The working draft for GridEngine plans and timeline can be found here[6] If you need further clarifications, reach out to us on the ticket for your specific tool on Phabricator or reach out via any of our communication channels[7] Thanks! ---------- [0]: https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/… [1]: https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/ [2]: https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/Enhanceme… GridEngine_plans_and_timeline#Use_case_continuity [3]: https://phabricator.wikimedia.org/T194332 [4]: https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation#… [5]: https://phabricator.wikimedia.org/project/profile/6135/ [6]: https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation [7]: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/About_Toolforge#Commun… -- Seyram Komla Sapaty Developer Advocate Wikimedia Cloud Services

1 0

Subject: [REMINDER] Coolest Tool Award 2022: Call for nominations!
by Seyram Komla Sapaty 11 Oct '22

11 Oct '22

Hello! Last reminder, there are 2 days left (today and tomorrow) to submit your Coolest Tool Award <https://meta.wikimedia.org/wiki/Coolest_Tool_Award> nominations. Please recommend your favorite tools! Thanks! Komla, for the Coolest Tool Academy 2022 -- Seyram Komla Sapaty Developer Advocate Wikimedia Cloud Services

1 0

New servers don't show up on grafana-labs.wikimedia.org
by Sebastian Berlin 07 Oct '22

07 Oct '22

We may want to use VPS Cloud in a project and one of the requirements is that there should be statistics for the server (CPU, RAM etc.). I poked around a bit and found https://grafana-labs.wikimedia.org which shows this. The servers for another project I'm working on (Wikispeech) shows up on: https://grafana-labs.wikimedia.org/d/000000059/cloud-vps-project-board?orgI…. However, two new servers I added recently ("tts-dev" and "demo-wiki") don't show up. Is there anything extra that you need to do to make the servers show up in Grafana? I don't remember me or anyone else doing that to the older servers, but I might just have forgotten or missed that. *Sebastian Berlin* Utvecklare/*Developer* Wikimedia Sverige (WMSE) E-post/*E-Mail*: sebastian.berlin(a)wikimedia.se Telefon/*Phone*: (+46) 0707 - 92 03 84

9 13

2024

2023

2022

2021

2020

2019

2018

2017

Cloud October 2022