Hello all,
Here is a quick update on Outreachy Round 25: we recently concluded the
final call for projects and mentors and are now promoting 6 projects led by
14 mentors. If you know someone who cleared the Outreachy's initial
eligibility check, encourage them to explore the Wikimedia projects below:
- Create a web application for editing Toolhub records, mentored by:
Slavina Stefanova, Damilare Adedoyin, Roy Smith
- Develop features for Wiki Loves Monuments app, mentored by: Ederporto,
Mike Peel
- Rewrite Imagebulk tool to scale up, mentored by: Jay Prakash,
Sudhanshu Gautam
- Write a Ruby gem for analyzing Wikidata edits, mentored by: Sage Ross,
Will Kent
- Develop a web app for patrolling based on the new ML-based service to
predict reverts, mentored by: Diego Saez-Trumper, Muniza A.
- Hybrid event production for QueeringWikipedia 2023, mentored by: Z.
Blace, Owen Blacker, Freddy eduardo
If you are interested in any of these projects, you can either subscribe to
the related Phabricator tickets or share your ideas and suggestions in a
comment.
Learn more here: <https://www.mediawiki.org/wiki/Outreachy/Round_25> [1]
Cheers,
Srishti
[1] https://www.mediawiki.org/wiki/Outreachy/Round_25
*Srishti Sethi*
Senior Developer Advocate
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello everyone,
The last & final feedback session on the "Small wiki toolkits" (SWT)
workshop series is coming up - it will take place on Friday, October 28th,
at 16:00 UTC. You can find more details on the workshop and a link to join
here: <
https://meta.wikimedia.org/wiki/Small_wiki_toolkits/Workshops#Upcoming:_Fin…>
[1].
This workshop will gather feedback on the SWT workshop series around bots
and scripts development, ongoing since January 2022. There will be a
discussion around the following:
- Overall feedback on the workshop series
- Technical topics you would like to see the SWT team focus on by
running workshops or developing resources in 2023
- Your preferred learning formats
This session does not require attendance in previous workshops to
participate. We look forward to your participation!
Best,
Srishti
On behalf of the SWT Workshops Organization team
[1]
https://meta.wikimedia.org/wiki/Small_wiki_toolkits/Workshops#Upcoming:_Fin…
*Srishti Sethi*
Senior Developer Advocate
Wikimedia Foundation <https://wikimediafoundation.org/>
TL;DR:
* https://toolsadmin.wikimedia.org now allows marking a tool as "disabled".
* Disabling a tool will immediately stop any running jobs including
webservices and prevent maintainers from logging in as the tool.
* Disabled tools are archived and deleted after 40 days.
* Disabled tools can be re-enabled at any time prior to being archived
and deleted.
"How can I delete a tool that I no longer want?" is a question that
folks have been asking for a very long time. I know of Phabricator
tasks going back to at least April 2016 [0] tracking such requests. A
bit over 5 years ago I created a Phabricator task to track figuring
out how to delete an unused tool [1]. Nearly 18 months ago Andrew
Bogott started to look into how we could automate the checklist of
cleanup steps that had been developed. By January 2022 Andrew had
implemented all of the pieces needed complete the checklist. This came
with a command line tool that Toolforge admins have been able to use
to delete a tool. Today we have released updates to Striker
(<https://toolsadmin.wikimedia.org>) which finally expose a "disable
tool" button to a tool's maintainers [2].
When a tool is marked as disabled any running jobs it has on the Grid
Engine or Kubernetes backends are stopped. Changes are also made so
that new jobs cannot be started, any crontab file is archived, and
maintainers are prevented from using `become <tool>`. Normally things
stay in this state for 40 days to give everyone a chance to change
their minds and re-enable to tool. Once the 40 day timer expires, the
system will proceed with cleanup tasks that are more difficult to
reverse including archiving and deleting the tool's $HOME and ToolsDB
databases. Ultimately the tool's group and user are deleted from the
LDAP directory which functionally completes the process.
A lot of system administration tasks are kind of boring, but this work
turned out to be actually pretty interesting. A Toolforge tool can
include quite a number of different parts. There can be jobs running
on the Grid Engine and/or Kubernetes, a crontab to start jobs
periodically, a database in ToolsDB, credentials for accessing the
Wiki Replicas, credentials for accessing the Toolforge Elasticsearch
cluster, a $HOME directory on the Toolforge NFS server, and account
information in the LDAP directory that powers Developer accounts and
Cloud VPS credentials. All of these things would ideally be removed
when a tool was successfully deleted. Some of them are things that we
would like to create historical archives of incase someone wanted to
recreate the tool's functionality. And in a perfect world we would
also be able to change our minds and start the tool back up if things
had not progressed to fully deleting the tool.
Andrew came up with a fairly elegant system to deal with this
complexity. He designed a series of processes which are each
responsible for a slice of the overall complexity. A process running
on the Grid controller is responsible for stopping running Grid Engine
jobs and changing the tool's quota so that no new jobs can be started.
A process running on the Crontab server archives the tool's crontab
configuration. A process running on the Kubernetes controller deletes
the tool's credentials for accessing the Kubernetes cluster, the
tool's namespace, and by extension removes all processes running in
the namespace. A process running on the NFS controller archives the
tool's $HOME directory contents and deletes the directory. It also
removes the tool from other LDAP membership lists (a tool can be a
co-maintainer of another tool) and deletes the tool's user and group
from the LDAP directory. A process archives ToolsDB tables. Another
process removes the tool's database credentials across the ToolsDB and
Wiki Replicas server pools. Many of these processes are implemented in
cloud/toolforge/disable-tool on Gerrit [3]. Others were added to
existing management controllers for creating Kubernetes and database
credentials. The processes all take cues from the LDAP directory and
tracking files in the tool's $HOME to create an eventually consistent,
decoupled collection of clean up actions.
We still have some work to do to update documentation on wikitech and
Phabricator so that folks know where to find the new buttons. If you
find documentation that needs to be updated before someone else gets
to it, please feel empowered to be [[WP:BOLD]] and update them.
[0]: https://phabricator.wikimedia.org/T133777
[1]: https://phabricator.wikimedia.org/T170355
[2]: https://phabricator.wikimedia.org/T285403
[3]: https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/toolforge/disable-tool/
[[WP:BOLD]]: https://en.wikipedia.org/wiki/Wikipedia:Be_bold
Bryan, on behalf of the Toolforge administration team
--
Bryan Davis Technical Engagement Wikimedia Foundation
Principal Software Engineer Boise, ID USA
[[m:User:BDavis_(WMF)]] irc: bd808
_______________________________________________
Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org
List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…
The shared NFS servers that back toolforge have been running close to
full for a while. We are going to free up space by taking the following
steps:
- Remove all files ending with .log and .err that have not been modified
since November 1st, 2021 (e.g. find -name '*.log' -not -newermt "Nov 1,
2021" -exec rm {} \;)
- Truncate all files ending with .log and .err to a total size of 1GB.
(e.g find -name '*.log' -size +1G-exec truncate --size=1G {} \;)
We'll be running those commands on Friday of this week. If you have any
log or err files of that form that need to NOT be truncated and/or
deleted, rename them now!
Also, please take moment to run 'du' in your home and tool dirs and
delete any other files that you can live without.
Thank you!
-Andrew
_______________________________________________
Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org
List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…
As part of routine security maintenance, all Debian Bullseye VMs are due
for a reboot and kernel upgrade. I will be performing these reboots
early next week, either on Monday or Tuesday.
If you want to reboot hosts on your own time (rather than at a random
Andrew-selected time), feel free to reboot your own hosts before then.
-Andrew + the WMCS team
_______________________________________________
Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org
List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…
spi-tools and spi-tools-dev both occasionally get wedged. HTTP requests just hang and eventually time out with a 50x. Nothing gets logged in either my django application log, or in uwsgi.log. If I restart the service, things are fine until it happens again.
Any ideas how I can get better visibility into what's happening? Can I make uwsgi do more verbose logging? Is there any way I can see the request progress through higher levels of the stack (nginx, etc) so I know where things go wrong?
Debian Stretch's security support ends in mid 2022, and the Foundation's
OS policy already discourages use of existing Stretch machines. That
means that it's time for all project admins to start rebuilding your VMs
with Bullseye (or, if you must, Buster.)
Any webservices running in Kubernetes created in the last year or two
are most likely using Buster images already, so there's no action needed
for those. Older kubernetes jobs should be refreshed to use more modern
images whenever possible.
If you are still using the grid engine for webservices, we strongly
encourage you to migrate your jobs to Kubernetes. For other grid uses,
watch this space for future announcements about grid engine migration;
we don't yet have a solution prepared for that.
Details about the what and why for this process can be found here:
https://wikitech.wikimedia.org/wiki/News/Stretch_deprecation
Here is the deprecation timeline:
March 2021: Stretch VM creation disabled in most projects
July 6, 2021: Active support of Stretch ends, Stretch moves into LTS
<- You are Here ->
January 1st, 2022: Stretch VM creation disabled in all projects,
deprecation nagging begins in earnest. Stretch alternatives will be
available for tool migration in Toolforge
May 1, 2022: All active Stretch VMs will be shut down (but not deleted)
by WMCS admins. This includes Toolforge grid exec nodes.
June 30, 2022: LTS support for Debian Stretch ends, all Stretch VMs will
be deleted by WMCS admins
_______________________________________________
Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org
List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…
Hello!
Earlier this year, WMCS initiated the process to migrate tools off the
grid[0].
We also published a series of blog posts explaining further the reasoning
behind this action[1]
We encouraged maintainers to move to Kubernetes if they could but also made
available Debian Buster GridEngine for those tools who were blocked or
otherwise unable to migrate to Kubernetes at that time.
We are aware that not all workloads can easily move from the grid to
Kubernetes.[2]
For some of the current grid workflows, there may be no 1:1 functionality
match on Kubernetes.
Work is underway to address most of these issues[3]
We’re putting together a use case continuity table showing GridEngine
workloads and their equivalent Kubernetes workloads[4]
[image: case continuity.PNG]
To help track the specific migration work, we created a Phabricator
ticket(project tag: grid-engine-to-k8s-migration[5]) for each tool that is
currently running on GridEngine. With a ticket for each tool on GridEngine,
we hope to collect specific blocking issues and have the team work on
addressing them.
We encourage maintainers to reach out if you need help or find you are
blocked by missing features.
We noticed, after receiving notifications for these tickets, some of you
wondered whether the grid is being shut down immediately.
This is not the case. We will work with tool maintainers to ensure all
tools safely move off the grid(or are safely shutdown), only then will we
start looking at decommissioning the grid.
Apologies to those who felt spammed by the ticket creation process and got
worried about the future of their projects. We should have communicated
better around this process.
=== Way Forward ===
The working draft for GridEngine plans and timeline can be found here[6]
If you need further clarifications, reach out to us on the ticket for your
specific tool on Phabricator or reach out via any of our communication
channels[7]
Thanks!
----------
[0]:
https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/…
[1]: https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/
[2]:
https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/Enhanceme…
GridEngine_plans_and_timeline#Use_case_continuity
[3]: https://phabricator.wikimedia.org/T194332
[4]:
https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation#…
[5]: https://phabricator.wikimedia.org/project/profile/6135/
[6]:
https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation
[7]:
https://wikitech.wikimedia.org/wiki/Portal:Toolforge/About_Toolforge#Commun…
--
Seyram Komla Sapaty
Developer Advocate
Wikimedia Cloud Services
Hello!
Last reminder, there are 2 days left (today and tomorrow) to submit
your Coolest
Tool Award <https://meta.wikimedia.org/wiki/Coolest_Tool_Award>
nominations. Please recommend your favorite tools!
Thanks!
Komla, for the Coolest Tool Academy 2022
--
Seyram Komla Sapaty
Developer Advocate
Wikimedia Cloud Services
We may want to use VPS Cloud in a project and one of the requirements is
that there should be statistics for the server (CPU, RAM etc.). I poked
around a bit and found https://grafana-labs.wikimedia.org which shows this.
The servers for another project I'm working on (Wikispeech) shows up on:
https://grafana-labs.wikimedia.org/d/000000059/cloud-vps-project-board?orgI….
However, two new servers I added recently ("tts-dev" and "demo-wiki") don't
show up.
Is there anything extra that you need to do to make the servers show up in
Grafana? I don't remember me or anyone else doing that to the older
servers, but I might just have forgotten or missed that.
*Sebastian Berlin*
Utvecklare/*Developer*
Wikimedia Sverige (WMSE)
E-post/*E-Mail*: sebastian.berlin(a)wikimedia.se
Telefon/*Phone*: (+46) 0707 - 92 03 84