We may want to use VPS Cloud in a project and one of the requirements is that there should be statistics for the server (CPU, RAM etc.). I poked around a bit and found https://grafana-labs.wikimedia.org which shows this.
The servers for another project I'm working on (Wikispeech) shows up on: https://grafana-labs.wikimedia.org/d/000000059/cloud-vps-project-board?orgId.... However, two new servers I added recently ("tts-dev" and "demo-wiki") don't show up.
Is there anything extra that you need to do to make the servers show up in Grafana? I don't remember me or anyone else doing that to the older servers, but I might just have forgotten or missed that.
*Sebastian Berlin* Utvecklare/*Developer* Wikimedia Sverige (WMSE)
E-post/*E-Mail*: sebastian.berlin@wikimedia.se Telefon/*Phone*: (+46) 0707 - 92 03 84
On Wed, Oct 5, 2022 at 9:13 AM Sebastian Berlin sebastian.berlin@wikimedia.se wrote:
We may want to use VPS Cloud in a project and one of the requirements is that there should be statistics for the server (CPU, RAM etc.). I poked around a bit and found https://grafana-labs.wikimedia.org which shows this.
The servers for another project I'm working on (Wikispeech) shows up on: https://grafana-labs.wikimedia.org/d/000000059/cloud-vps-project-board?orgId.... However, two new servers I added recently ("tts-dev" and "demo-wiki") don't show up.
Is there anything extra that you need to do to make the servers show up in Grafana? I don't remember me or anyone else doing that to the older servers, but I might just have forgotten or missed that.
You have discovered https://phabricator.wikimedia.org/T264920 which is currently blocked on https://phabricator.wikimedia.org/T266050. TL;DR the data collector that is used on older Debian releases to populate that dashboard was removed and the WMCS space does not currently have a universal replacement for it. Taavi has been poking at these issues for quite a while as he has time and motivation, but currently there is no timeline for completion.
Bryan
Hi!
As Bryan said, this is a work in progress, we might be able to allocate some time for it in the next quarter to help Taavi and get the project going.
If you would be willing to be an early user (I can't promise any timelines), feel free to add a comment in the task Bryan mentioned (https://phabricator.wikimedia.org/T266050) and we'll keep you in mind when rolling it out :)
On 10/05 11:41, Bryan Davis wrote:
On Wed, Oct 5, 2022 at 9:13 AM Sebastian Berlin sebastian.berlin@wikimedia.se wrote:
We may want to use VPS Cloud in a project and one of the requirements is that there should be statistics for the server (CPU, RAM etc.). I poked around a bit and found https://grafana-labs.wikimedia.org which shows this.
The servers for another project I'm working on (Wikispeech) shows up on: https://grafana-labs.wikimedia.org/d/000000059/cloud-vps-project-board?orgId.... However, two new servers I added recently ("tts-dev" and "demo-wiki") don't show up.
Is there anything extra that you need to do to make the servers show up in Grafana? I don't remember me or anyone else doing that to the older servers, but I might just have forgotten or missed that.
You have discovered https://phabricator.wikimedia.org/T264920 which is currently blocked on https://phabricator.wikimedia.org/T266050. TL;DR the data collector that is used on older Debian releases to populate that dashboard was removed and the WMCS space does not currently have a universal replacement for it. Taavi has been poking at these issues for quite a while as he has time and motivation, but currently there is no timeline for completion.
Bryan
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808 _______________________________________________ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
On 10/5/22 21:01, David Caro wrote:
As Bryan said, this is a work in progress, we might be able to allocate some time for it in the next quarter to help Taavi and get the project going.
This is news to me. In general, I feel that over the last few months, quite a lot of planning and progress reporting has moved from our various public channels (most notably Phabricator and -cloud-admin on IRC) to private ones. I don't particularly like this trend.
Taavi
Thanks for sharing Taavi, yes, there's some other talks happening on other channels.
For the observability, it's only an idea for now, on my head, that I want to push for next quarter if possible, so nothing has been discussed yet or decided (aside from me mentioning te subject in some meetings), we are about to do some discussions internally in the team to sync for next quarter, so that's one of the ides I'm going to bring. There's no other phab, document or anything else for it, yet.
For the buildpacks, there's a lot of tasks created under the main one ( https://phabricator.wikimedia.org/T194332, specifically https://phabricator.wikimedia.org/T267374) that is what we are working on, we created a couple subprojects to be able to synchronize ourselves (called iteration XX, where XX is a number).
We have been doing a lot of the synchronization on slack, and I apologize for that, I'll bring the subject to the team for discussion.
Otherwise, we have been quite stall with the project.
Thanks again for raising this, it helps to get nudged in the right direction 👍
On Thu, 6 Oct 2022, 13:39 Taavi Väänänen, hi@taavi.wtf wrote:
On 10/5/22 21:01, David Caro wrote:
As Bryan said, this is a work in progress, we might be able to allocate
some time for it in the next quarter to help
Taavi and get the project going.
This is news to me. In general, I feel that over the last few months, quite a lot of planning and progress reporting has moved from our various public channels (most notably Phabricator and -cloud-admin on IRC) to private ones. I don't particularly like this trend.
Taavi
Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
On 10/6/22 6:39 AM, Taavi Väänänen wrote:
This is news to me. In general, I feel that over the last few months, quite a lot of planning and progress reporting has moved from our various public channels (most notably Phabricator and -cloud-admin on IRC) to private ones. I don't particularly like this trend.
I'm definitely interested in talking and thinking about this more. I think it is true that the cloud services staff have started coordinating more frequently in video calls, so your comment is a useful reminder that we need to redouble our efforts on post-call documentation.
Are there other topics, decisions, or work areas that have recently vanished behind the curtain? And, if so, do you have thoughts about how we can be better?
As a team we definitely aspire to do essentially all of our work in public view, but lately I've been struggling a bit with what exactly that should mean. Communication channels proliferate and everyone seems to only get a 30% view of what's happening depending on which feeds they follow. A good example is Arturo's blog posts about Toolforge futures[0] which are quite effective as /potential/ communication but may not have actually reached the eyes most in need of an update.
-Andrew
[0] https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/
On 10/6/22 23:16, Andrew Bogott wrote:
I'm definitely interested in talking and thinking about this more. I think it is true that the cloud services staff have started coordinating more frequently in video calls, so your comment is a useful reminder that we need to redouble our efforts on post-call documentation.
I'll start with the disclaimer that I'm very much involved the infrastructure side of WMCS. Others in a different position, for example those using our "products" may have different views, and I'd be curious to hear them.
I'd also like to make it clear that I'm not angry about a single decision or person. Most of this has been in my mind for a while now, and Lucas wondering about the current status of the grid engine made me realize that I should probably voice these concerns so that we can do something about it. I'm happy to see that others care about these points too.
Are there other topics, decisions, or work areas that have recently vanished behind the curtain? And, if so, do you have thoughts about how we can be better?
I feel like that for quite a few projects, the actual technical work is tracked publicly in Phabricator, but the planning and roadmapping process is happening behind the scenes. The grid engine deprecation and build pack stuff are both good examples of this.
The work on Magnum (k8s-as-a-service) also falls in this category I think. I know that there's work going on to make Magnum available to Cloud VPS users, but I don't know if that's intended to be used by non-WMCS managed projects or if there are plans to move PAWS or Toolforge to use it. (I'm initially very skeptical to moving Toolforge off the current kubeadm setup for various reasons, which I'm happy to talk about separately.)
I'm also going to use this opportunity to note that there is WMCS work going on which isn't problematic in this sense. For example, the very recent work to replace the cloudnet hardware is very easy to follow. For example comments like https://phabricator.wikimedia.org/T319300#8285959 are very helpful.
As a team we definitely aspire to do essentially all of our work in public view, but lately I've been struggling a bit with what exactly that should mean. Communication channels proliferate and everyone seems to only get a 30% view of what's happening depending on which feeds they follow. A good example is Arturo's blog posts about Toolforge futures[0] which are quite effective as /potential/ communication but may not have actually reached the eyes most in need of an update.
[0] https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/
This is a good question. I'm *not* advocating for a model where there are no private meetings or Slack groups or stuff like that. I would like to be aware that meetings are happening so that I can provide context on matters I'm familiar with or voice my opinions when I have those for certain approaches. I also would like to be aware of major decisions, especially if they affect projects I'm working on.
As a final note, I've been referring to the #wikimedia-cloud-admin IRC channel and the cloud-admin mailing list as public venues. While technically true (the IRC channel and the mailing list archives are public), I don't think those are mentioned anywhere on Wikitech and cloud-admin subscription is moderated for non-staff (and I have a vague memory of my subscription being rejected before I had Toolforge admin access). I think there's some work to be done here to make it easier for people to get involved.
Taavi
The work on Magnum (k8s-as-a-service) also falls in this category I think. I know that there's work going on to make Magnum available to Cloud VPS users, but I don't know if that's intended to be used by non-WMCS managed projects or if there are plans to move PAWS or Toolforge to use it. (I'm initially very skeptical to moving Toolforge off the current kubeadm setup for various reasons, which I'm happy to talk about separately.)
In terms of PAWS, and Quarry, there are plans to move them. https://phabricator.wikimedia.org/T308873 https://phabricator.wikimedia.org/T301469 have some notes, and associated branches for both. I am aware of no plans of moving toolforge, thus, assume there are no tickets or similar discussing it. In terms of the above plans for Quarry and PAWS, how can it be better communicated what is happening with them?
On Fri, Oct 7, 2022 at 3:14 AM Taavi Väänänen hi@taavi.wtf wrote:
On 10/6/22 23:16, Andrew Bogott wrote:
I'm definitely interested in talking and thinking about this more. I think it is true that the cloud services staff have started coordinating more frequently in video calls, so your comment is a useful reminder that we need to redouble our efforts on post-call documentation.
I'll start with the disclaimer that I'm very much involved the infrastructure side of WMCS. Others in a different position, for example those using our "products" may have different views, and I'd be curious to hear them.
I'd also like to make it clear that I'm not angry about a single decision or person. Most of this has been in my mind for a while now, and Lucas wondering about the current status of the grid engine made me realize that I should probably voice these concerns so that we can do something about it. I'm happy to see that others care about these points too.
Are there other topics, decisions, or work areas that have recently vanished behind the curtain? And, if so, do you have thoughts about how we can be better?
I feel like that for quite a few projects, the actual technical work is tracked publicly in Phabricator, but the planning and roadmapping process is happening behind the scenes. The grid engine deprecation and build pack stuff are both good examples of this.
The work on Magnum (k8s-as-a-service) also falls in this category I think. I know that there's work going on to make Magnum available to Cloud VPS users, but I don't know if that's intended to be used by non-WMCS managed projects or if there are plans to move PAWS or Toolforge to use it. (I'm initially very skeptical to moving Toolforge off the current kubeadm setup for various reasons, which I'm happy to talk about separately.)
I'm also going to use this opportunity to note that there is WMCS work going on which isn't problematic in this sense. For example, the very recent work to replace the cloudnet hardware is very easy to follow. For example comments like https://phabricator.wikimedia.org/T319300#8285959 are very helpful.
As a team we definitely aspire to do essentially all of our work in public view, but lately I've been struggling a bit with what exactly that should mean. Communication channels proliferate and everyone seems to only get a 30% view of what's happening depending on which feeds they follow. A good example is Arturo's blog posts about Toolforge futures[0] which are quite effective as /potential/ communication but may not have actually reached the eyes most in need of an update.
[0] https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/
This is a good question. I'm *not* advocating for a model where there are no private meetings or Slack groups or stuff like that. I would like to be aware that meetings are happening so that I can provide context on matters I'm familiar with or voice my opinions when I have those for certain approaches. I also would like to be aware of major decisions, especially if they affect projects I'm working on.
As a final note, I've been referring to the #wikimedia-cloud-admin IRC channel and the cloud-admin mailing list as public venues. While technically true (the IRC channel and the mailing list archives are public), I don't think those are mentioned anywhere on Wikitech and cloud-admin subscription is moderated for non-staff (and I have a vague memory of my subscription being rejected before I had Toolforge admin access). I think there's some work to be done here to make it easier for people to get involved.
Taavi
Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
On Fri, Oct 7, 2022 at 1:14 AM Taavi Väänänen hi@taavi.wtf wrote:
As a final note, I've been referring to the #wikimedia-cloud-admin IRC channel and the cloud-admin mailing list as public venues. While technically true (the IRC channel and the mailing list archives are public), I don't think those are mentioned anywhere on Wikitech and cloud-admin subscription is moderated for non-staff (and I have a vague memory of my subscription being rejected before I had Toolforge admin access). I think there's some work to be done here to make it easier for people to get involved.
The cloud-admin@lists.wikimedia.org mailing list has always had public archives (which are much more useful now that we have mailman3) but closed membership. The intent of this list was never to be a forum for community discussion, but we did consciously choose to make the archives open so that interested parties could see the discussions that were held here.
Bryan
FWIW, the WMF Data Engineering team has had some discussions about this in the past, and we decided that all decisions and real work doc has to be either in Phabricator or on a Wiki. We use Slack or IRC to discuss things, but if we come to a conclusion, we try to summarize and document the conclusion in Phab. I think we do this well with most things, but not as well as we should with annual and quarterly planning docs (I think these are in google docs and better works).
On Fri, Oct 7, 2022 at 10:54 AM Bryan Davis bd808@wikimedia.org wrote:
On Fri, Oct 7, 2022 at 1:14 AM Taavi Väänänen hi@taavi.wtf wrote:
As a final note, I've been referring to the #wikimedia-cloud-admin IRC channel and the cloud-admin mailing list as public venues. While technically true (the IRC channel and the mailing list archives are public), I don't think those are mentioned anywhere on Wikitech and cloud-admin subscription is moderated for non-staff (and I have a vague memory of my subscription being rejected before I had Toolforge admin access). I think there's some work to be done here to make it easier for people to get involved.
The cloud-admin@lists.wikimedia.org mailing list has always had public archives (which are much more useful now that we have mailman3) but closed membership. The intent of this list was never to be a forum for community discussion, but we did consciously choose to make the archives open so that interested parties could see the discussions that were held here.
Bryan
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808 _______________________________________________ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
Andrew, I think one general issue is the lack of transparency (and community involvement) in the decision making discussions. Not the documentation of the decisions once done. That's good to have but if we have transparency as a guiding principle, we should include that in the discussions too.
My 2c
Andrew Otto otto@wikimedia.org schrieb am Fr., 7. Okt. 2022, 18:40:
FWIW, the WMF Data Engineering team has had some discussions about this in the past, and we decided that all decisions and real work doc has to be either in Phabricator or on a Wiki. We use Slack or IRC to discuss things, but if we come to a conclusion, we try to summarize and document the conclusion in Phab. I think we do this well with most things, but not as well as we should with annual and quarterly planning docs (I think these are in google docs and better works).
On Fri, Oct 7, 2022 at 10:54 AM Bryan Davis bd808@wikimedia.org wrote:
On Fri, Oct 7, 2022 at 1:14 AM Taavi Väänänen hi@taavi.wtf wrote:
As a final note, I've been referring to the #wikimedia-cloud-admin IRC channel and the cloud-admin mailing list as public venues. While technically true (the IRC channel and the mailing list archives are public), I don't think those are mentioned anywhere on Wikitech and cloud-admin subscription is moderated for non-staff (and I have a vague memory of my subscription being rejected before I had Toolforge admin access). I think there's some work to be done here to make it easier for people to get involved.
The cloud-admin@lists.wikimedia.org mailing list has always had public archives (which are much more useful now that we have mailman3) but closed membership. The intent of this list was never to be a forum for community discussion, but we did consciously choose to make the archives open so that interested parties could see the discussions that were held here.
Bryan
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808 _______________________________________________ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
+1
On Fri, Oct 7, 2022 at 2:04 PM Amir Sarabadani ladsgroup@gmail.com wrote:
Andrew, I think one general issue is the lack of transparency (and community involvement) in the decision making discussions. Not the documentation of the decisions once done. That's good to have but if we have transparency as a guiding principle, we should include that in the discussions too.
My 2c
Andrew Otto otto@wikimedia.org schrieb am Fr., 7. Okt. 2022, 18:40:
FWIW, the WMF Data Engineering team has had some discussions about this in the past, and we decided that all decisions and real work doc has to be either in Phabricator or on a Wiki. We use Slack or IRC to discuss things, but if we come to a conclusion, we try to summarize and document the conclusion in Phab. I think we do this well with most things, but not as well as we should with annual and quarterly planning docs (I think these are in google docs and better works).
On Fri, Oct 7, 2022 at 10:54 AM Bryan Davis bd808@wikimedia.org wrote:
On Fri, Oct 7, 2022 at 1:14 AM Taavi Väänänen hi@taavi.wtf wrote:
As a final note, I've been referring to the #wikimedia-cloud-admin IRC channel and the cloud-admin mailing list as public venues. While technically true (the IRC channel and the mailing list archives are public), I don't think those are mentioned anywhere on Wikitech and cloud-admin subscription is moderated for non-staff (and I have a vague memory of my subscription being rejected before I had Toolforge admin access). I think there's some work to be done here to make it easier
for
people to get involved.
The cloud-admin@lists.wikimedia.org mailing list has always had public archives (which are much more useful now that we have mailman3) but closed membership. The intent of this list was never to be a forum for community discussion, but we did consciously choose to make the archives open so that interested parties could see the discussions that were held here.
Bryan
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808 _______________________________________________ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
On Thu, Oct 6, 2022 at 5:39 AM Taavi Väänänen hi@taavi.wtf wrote:
In general, I feel that over the last few months, quite a lot of planning and progress reporting has moved from our various public channels (most notably Phabricator and -cloud-admin on IRC) to private ones. I don't particularly like this trend.
I did a thing in my late afternoon yesterday that may have aggravated Tavvi's feelings of being left out of decision loops.
I made a decision without consulting any other Toolforge admins to add about 300MiB of fonts to the php7.4 Docker image available for use on Toolforge [0]. This decision reversed my prior blocking of this exact same request in 2019 [1]. It also goes against at least as many years of the Toolforge admins telling the Toolforge member community that we do not "bloat" the Kubernetes containers with specialty features for a small number of use cases. This reversal will complicate future decisions on such issues by introducing this easily seen counter example. I acted with good intent in the moment, but I did not act with good judgement nor consideration of my partners in maintaining the Toolforge infrastructure. For that I am truly sorry.
I would also like to apologize for treating what I was doing as "urgent" when it could have easily waited for a discussion with others either in code review or in other forums. This false urgency was counter to what I know to be the best way to treat technical decisions and it was disrespectful of my co-admins in the Toolforge environment.
I would also like to have a conversation among the Toolforge admins about how to best deal with this decision going forward. That conversation is probably better had on Phabricator or the cloud-admin mailing list than here, but it should happen and it should result in either reverting the change that I made or jointly creating updated guidelines for what is and is not acceptable in the shared Kubernetes containers while we await better methods of managing per-tool feature differences.
[0]: https://phabricator.wikimedia.org/T310435#8288848 [1]: https://gerrit.wikimedia.org/r/c/operations/docker-images/toollabs-images/+/...
Bryan
I've created T320224 https://phabricator.wikimedia.org/T320224 to discuss the issue of the php74 image and its fonts. I'm the one who was hassling to get the fonts back in 2019, and probably again this year. I'm sorry if it made it seem urgent! SVG Translate is an important tool, but it's not something that should dictate any overall systems on Toolforge. :-) I'd be happy to help with whatever work is required to sort this out.
On 7/10/22 08:21, Bryan Davis wrote:
On Thu, Oct 6, 2022 at 5:39 AM Taavi Väänänenhi@taavi.wtf wrote:
In general, I feel that over the last few months, quite a lot of planning and progress reporting has moved from our various public channels (most notably Phabricator and -cloud-admin on IRC) to private ones. I don't particularly like this trend.
I did a thing in my late afternoon yesterday that may have aggravated Tavvi's feelings of being left out of decision loops.
I made a decision without consulting any other Toolforge admins to add about 300MiB of fonts to the php7.4 Docker image available for use on Toolforge [0]. This decision reversed my prior blocking of this exact same request in 2019 [1]. It also goes against at least as many years of the Toolforge admins telling the Toolforge member community that we do not "bloat" the Kubernetes containers with specialty features for a small number of use cases. This reversal will complicate future decisions on such issues by introducing this easily seen counter example. I acted with good intent in the moment, but I did not act with good judgement nor consideration of my partners in maintaining the Toolforge infrastructure. For that I am truly sorry.
I would also like to apologize for treating what I was doing as "urgent" when it could have easily waited for a discussion with others either in code review or in other forums. This false urgency was counter to what I know to be the best way to treat technical decisions and it was disrespectful of my co-admins in the Toolforge environment.
I would also like to have a conversation among the Toolforge admins about how to best deal with this decision going forward. That conversation is probably better had on Phabricator or the cloud-admin mailing list than here, but it should happen and it should result in either reverting the change that I made or jointly creating updated guidelines for what is and is not acceptable in the shared Kubernetes containers while we await better methods of managing per-tool feature differences.
Bryan