After talking with both Arturo and Birgit about things we might
present at Wikimania, I came up with this abstract for a talk:
Co-creating platforms and products: how the Wikimedia Cloud Services
team works with the larger Wikimedia technical community to build and
maintain Cloud VPS, Toolforge, Quarry, PAWS, and more
Did you know that volunteers are involved in planning, building, and
maintaining the Cloud VPS and Toolforge projects as co-equals with
paid staff from the Wikimedia Foundation? Since the start of the
"Labs" project in 2011, one of the guiding principles for WMCS
projects has been improving collaboration between Foundation staff and
technical volunteers. Learn more about some of the policies and
practices that are used to make this collaboration possible.
The submission would be under either the "governance" or "technology"
tracks. I think it would work best as a panel discussion that is
either "hybrid" (some folks in Singapore, some on-line) or
I think this is something that folks in the community might be
interested in learning a bit about. I also think it would be
interesting for those of us who have participated in this process to
take some time to reflect on how we have worked together in the past
and how we might like to see those those processes and practices
evolve in the future. To make this talk work well there should be
active voices from both the paid and volunteer staff involved. Towards
that end, I'm mailing the cloud-admin@ list + 4 of you that I know
have been active in the past in helping with Toolforge and/or Cloud
VPS admin and features work to gauge your interest in participating.
Bryan Davis Technical Engagement Wikimedia Foundation
Principal Software Engineer Boise, ID USA
[[m:User:BDavis_(WMF)]] irc: bd808
In response to our recent maintenance windows we got some feedback
about advance notice of outages. I created this chart to provide us with
some internal guidelines about when we should publicize maintenance, and
how to do so:
You will notice that at the moment my imagination is limited to 'write
to a mailing list.' I encourage people to fill in ideas on that page (or
the associated talk page) about other ways we can warn people about
these things. If we wind up with so many broadcast channels that it
becomes impractical to actually use them all we can invest in automation.
I'm also not especially committed to the brackets on that chart; I'd
like to have broad categories and low standards, but edits are welcome!
One thing that I want to be more mindful about is the distinction
between "things that mess with our users" (e.g. quarry or horizon
downtime) vs. "things that mess with our users' users" (e.g. web proxy
downtime.) I'd love it if someone with better wiki-editing skills
spruced up the chart to reflect that difference.
 for example https://phabricator.wikimedia.org/T333477#8764263
On 3/30/23 12:42, Arturo Borrero Gonzalez wrote:
> On 3/28/23 00:13, Taavi Väänänen wrote:
>> We will be upgrading the Toolforge Kubernetes cluster next Monday (2023-04-03)
>> starting at around 10:00 UTC.
>> The expected impact is that tools running on the Kubernetes cluster will get
>> restarted a couple of times over the course of the few hours it takes for us
>> to upgrade the entire cluster. The ability to manage tools will remain
>> Since the version we're upgrading to (1.22) removes a bunch of deprecated
>> Kubernetes APIs, tools that use kubectl and raw Kubernetes resources directly
>> may want to check that they're on the latest available versions. The vast
>> majority of tools that are only using the Jobs framework and/or the webservice
>> command are not affected by these changes.
> This has been rescheduled to Monday 2023-04-10 to leave room for the other
> operations we have.
This is happening now!
Arturo Borrero Gonzalez
Senior SRE / Wikimedia Cloud Services