On 10/6/22 23:16, Andrew Bogott wrote:
I'm definitely interested in talking and thinking about this more. I think it is true that the cloud services staff have started coordinating more frequently in video calls, so your comment is a useful reminder that we need to redouble our efforts on post-call documentation.
I'll start with the disclaimer that I'm very much involved the infrastructure side of WMCS. Others in a different position, for example those using our "products" may have different views, and I'd be curious to hear them.
I'd also like to make it clear that I'm not angry about a single decision or person. Most of this has been in my mind for a while now, and Lucas wondering about the current status of the grid engine made me realize that I should probably voice these concerns so that we can do something about it. I'm happy to see that others care about these points too.
Are there other topics, decisions, or work areas that have recently vanished behind the curtain? And, if so, do you have thoughts about how we can be better?
I feel like that for quite a few projects, the actual technical work is tracked publicly in Phabricator, but the planning and roadmapping process is happening behind the scenes. The grid engine deprecation and build pack stuff are both good examples of this.
The work on Magnum (k8s-as-a-service) also falls in this category I think. I know that there's work going on to make Magnum available to Cloud VPS users, but I don't know if that's intended to be used by non-WMCS managed projects or if there are plans to move PAWS or Toolforge to use it. (I'm initially very skeptical to moving Toolforge off the current kubeadm setup for various reasons, which I'm happy to talk about separately.)
I'm also going to use this opportunity to note that there is WMCS work going on which isn't problematic in this sense. For example, the very recent work to replace the cloudnet hardware is very easy to follow. For example comments like https://phabricator.wikimedia.org/T319300#8285959 are very helpful.
As a team we definitely aspire to do essentially all of our work in public view, but lately I've been struggling a bit with what exactly that should mean. Communication channels proliferate and everyone seems to only get a 30% view of what's happening depending on which feeds they follow. A good example is Arturo's blog posts about Toolforge futures[0] which are quite effective as /potential/ communication but may not have actually reached the eyes most in need of an update.
[0] https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/
This is a good question. I'm *not* advocating for a model where there are no private meetings or Slack groups or stuff like that. I would like to be aware that meetings are happening so that I can provide context on matters I'm familiar with or voice my opinions when I have those for certain approaches. I also would like to be aware of major decisions, especially if they affect projects I'm working on.
As a final note, I've been referring to the #wikimedia-cloud-admin IRC channel and the cloud-admin mailing list as public venues. While technically true (the IRC channel and the mailing list archives are public), I don't think those are mentioned anywhere on Wikitech and cloud-admin subscription is moderated for non-staff (and I have a vague memory of my subscription being rejected before I had Toolforge admin access). I think there's some work to be done here to make it easier for people to get involved.
Taavi