I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline: * 2020-01-09: 2020 Kubernetes cluster available for beta testers on an opt-in basis * 2020-01-23: 2020 Kubernetes cluster general availability for migration on an opt-in basis * 2020-02-10: Automatic migration of remaining workloads from 2016 cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
[0]: https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration [1]: https://tools.wmflabs.org/k8s-status/ [2]: https://tools.wmflabs.org/sge-status/
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Amazing work folks. I'm really proud of you all.
On Thu, Jan 9, 2020 at 3:58 PM Bryan Davis bd808@wikimedia.org wrote:
I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Hi
I tried the migration path described here: https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration#M...
That doesn't seem to be working for me (or at least not for my /dna/ tool).
Some problems:
1. `webservice status` on grid engine doesn't show PHP version it shows "Your webservice of type lighttpd is running". 2. When I do `webservice --backend=kubernetes php7.3 start` 1. nothing is shown in my error.log 2. and main page of /dna/ tool returns 503. 3. I also tried with default setup: 1. `echo -e "[Default]\n--backend=kubernetes" > $HOME/.webservicerc` 2. `webservice start` -> not working 😞 (starts, but dna returns 503) 4. Setting up default PHP version seem not to be allowed 1. This is not working: `echo -e "[Default]\n--backend=kubernetes php7.3" > $HOME/.webservicerc` 2. `webservice start` shows errors 3. Would be nice to be able to setup PHP version somewhere so that I just do `webservice start/stop`.
Also not sure what is all that `kubectl config` and the kubctl alias supposed to do. I assume it is obvious for someone using kubectl, but I just don't know that tool. Never used this virtualization system. I guess I'm not the only one 😉 I did do that context switch and alias thing before starting the webservice like a nice user 🙂. It's just that I don't know if it is even required. I also don't know if the webservice need to be stopped when doing this or not. I guess some more information would be useful to keep migration time shorter for all tool owners.
Oh, I probably should mention that my service started on toolserver, so I was on Tool Labs from the start. I might have some leftovers config which I guess might cause some problems. I only found very basic lighttpd config though. The PHP code is very old, but to my knowledge it runs fine on PHP 7.
Cheers, Nux
Bryan Davis (2020-01-09 22:57):
I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Hi Nux, I took a look, and I see you have DNA running on Grid Engine. Has it ever run ok on either Kubernetes backend (the old “default” or the new “toolforge”)?
Brooke Storm Senior SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Jan 11, 2020, at 3:07 PM, Maciej Jaros egil@wp.pl wrote:
Hi
I tried the migration path described here: https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration#M... https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration#Manually_migrate_a_webservice_to_the_new_cluster
That doesn't seem to be working for me (or at least not for my dna tool).
Some problems: `webservice status` on grid engine doesn't show PHP version it shows "Your webservice of type lighttpd is running". When I do ` webservice --backend=kubernetes php7.3 start` nothing is shown in my error.log and main page of dna tool returns 503. I also tried with default setup: `echo -e "[Default]\n--backend=kubernetes" > $HOME/.webservicerc` `webservice start` -> not working 😞 (starts, but dna returns 503) Setting up default PHP version seem not to be allowed This is not working: `echo -e "[Default]\n--backend=kubernetes php7.3" > $HOME/.webservicerc` `webservice start` shows errors Would be nice to be able to setup PHP version somewhere so that I just do `webservice start/stop`. Also not sure what is all that `kubectl config` and the kubctl alias supposed to do. I assume it is obvious for someone using kubectl, but I just don't know that tool. Never used this virtualization system. I guess I'm not the only one 😉 I did do that context switch and alias thing before starting the webservice like a nice user 🙂. It's just that I don't know if it is even required. I also don't know if the webservice need to be stopped when doing this or not. I guess some more information would be useful to keep migration time shorter for all tool owners.
Oh, I probably should mention that my service started on toolserver, so I was on Tool Labs from the start. I might have some leftovers config which I guess might cause some problems. I only found very basic lighttpd config though. The PHP code is very old, but to my knowledge it runs fine on PHP 7.
Cheers, Nux
Bryan Davis (2020-01-09 22:57):
I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/ https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailto:cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
[0]: https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration [1]: https://tools.wmflabs.org/k8s-status/ https://tools.wmflabs.org/k8s-status/ [2]: https://tools.wmflabs.org/sge-status/ https://tools.wmflabs.org/sge-status/
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org mailto:Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org mailto:labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud https://lists.wikimedia.org/mailman/listinfo/cloud
Brooke Storm (2020-01-12 20:53):
Hi Nux, I took a look, and I see you have DNA running on Grid Engine. Has it ever run ok on either Kubernetes backend (the old “default” or the new “toolforge”)?
Yes, IIRC Bryan did run it on kubernetes for me (on my request from IRC). That was just before Xmass. It seemed to have been runing fine for a while then. I switched back to when I was doing some updates. Mostly layout changes so shouldn't be a problem.
Anyway I think the most problematic part is that I don't see any errors in `error.log`. I'm guessing that is /not/ a problem specific to my tool, but rather some problem with k8s configuration. Other things are just inconvenient. Without logs I'm blind.
Brooke Storm Senior SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Jan 11, 2020, at 3:07 PM, Maciej Jaros <egil@wp.pl mailto:egil@wp.pl> wrote:
Hi
I tried the migration path described here: https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration#M...
That doesn't seem to be working for me (or at least not for my/dna/tool).
Some problems:
- `webservice status` on grid engine doesn't show PHP version it shows"Your webservice of type lighttpd is running".
- When I do `webservice --backend=kubernetes php7.3 start`
- nothing is shown in my error.log
- and main page of/dna/tool returns 503.
- I also tried with default setup:
- `echo -e "[Default]\n--backend=kubernetes" > $HOME/.webservicerc`
- `webservice start` -> not working 😞 (starts, but dna returns 503)
- Setting up default PHP version seem not to be allowed
- This is not working:`echo -e "[Default]\n--backend=kubernetes php7.3" > $HOME/.webservicerc`
- `webservice start` shows errors
- Would be nice to be able to setup PHP version somewhere so that I just do `webservice start/stop`.
Also not sure what is all that `kubectl config` and the kubctl alias supposed to do. I assume it is obvious for someone using kubectl, but I just don't know that tool. Never used this virtualization system. I guess I'm not the only one 😉 I did do that context switch and alias thing before starting the webservice like a nice user 🙂. It's just that I don't know if it is even required. I also don't know if the webservice need to be stopped when doing this or not. I guess some more information would be useful to keep migration time shorter for all tool owners.
Oh, I probably should mention that my service started on toolserver, so I was on Tool Labs from the start. I might have some leftovers config which I guess might cause some problems. I only found very basic lighttpd config though. The PHP code is very old, but to my knowledge it runs fine on PHP 7.
Cheers, Nux
Bryan Davis (2020-01-09 22:57):
I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and thecloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org mailto:Cloud@lists.wikimedia.org(formerlylabs-l@lists.wikimedia.org mailto:labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Between steps 1 and 2, did you insert “webservice stop”? If not, try that! :-)
Sent from my iPhone
On Jan 11, 2020, at 5:08 PM, Maciej Jaros egil@wp.pl wrote:
Hi
I tried the migration path described here: https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration#M...
That doesn't seem to be working for me (or at least not for my dna tool).
Some problems: `webservice status` on grid engine doesn't show PHP version it shows "Your webservice of type lighttpd is running". When I do ` webservice --backend=kubernetes php7.3 start` nothing is shown in my error.log and main page of dna tool returns 503. I also tried with default setup: `echo -e "[Default]\n--backend=kubernetes" > $HOME/.webservicerc` `webservice start` -> not working 😞 (starts, but dna returns 503) Setting up default PHP version seem not to be allowed This is not working: `echo -e "[Default]\n--backend=kubernetes php7.3" > $HOME/.webservicerc` `webservice start` shows errors Would be nice to be able to setup PHP version somewhere so that I just do `webservice start/stop`. Also not sure what is all that `kubectl config` and the kubctl alias supposed to do. I assume it is obvious for someone using kubectl, but I just don't know that tool. Never used this virtualization system. I guess I'm not the only one 😉 I did do that context switch and alias thing before starting the webservice like a nice user 🙂. It's just that I don't know if it is even required. I also don't know if the webservice need to be stopped when doing this or not. I guess some more information would be useful to keep migration time shorter for all tool owners.
Oh, I probably should mention that my service started on toolserver, so I was on Tool Labs from the start. I might have some leftovers config which I guess might cause some problems. I only found very basic lighttpd config though. The PHP code is very old, but to my knowledge it runs fine on PHP 7.
Cheers, Nux
Bryan Davis (2020-01-09 22:57):
I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Russell Blau (2020-01-12 21:40):
Between steps 1 and 2, did you insert “webservice stop”? If not, try that! :-)
Yes, webserivce was off. And I also did try to turn it off and on again ;-). Few times.
I also tried "php7.2" and that didn't work either ¯_(ツ)_/¯
Sent from my iPhone
On Jan 11, 2020, at 5:08 PM, Maciej Jaros egil@wp.pl wrote:
Hi
I tried the migration path described here: https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration#M...
That doesn't seem to be working for me (or at least not for my /dna/ tool).
Some problems:
- `webservice status` on grid engine doesn't show PHP version it shows "Your webservice of type lighttpd is running".
- When I do `webservice --backend=kubernetes php7.3 start`
- nothing is shown in my error.log
- and main page of /dna/ tool returns 503.
- I also tried with default setup:
- `echo -e "[Default]\n--backend=kubernetes" > $HOME/.webservicerc`
- `webservice start` -> not working 😞 (starts, but dna returns 503)
- Setting up default PHP version seem not to be allowed
- This is not working: `echo -e "[Default]\n--backend=kubernetes php7.3" > $HOME/.webservicerc`
- `webservice start` shows errors
- Would be nice to be able to setup PHP version somewhere so that I just do `webservice start/stop`.
Also not sure what is all that `kubectl config` and the kubctl alias supposed to do. I assume it is obvious for someone using kubectl, but I just don't know that tool. Never used this virtualization system. I guess I'm not the only one 😉 I did do that context switch and alias thing before starting the webservice like a nice user 🙂. It's just that I don't know if it is even required. I also don't know if the webservice need to be stopped when doing this or not. I guess some more information would be useful to keep migration time shorter for all tool owners.
Oh, I probably should mention that my service started on toolserver, so I was on Tool Labs from the start. I might have some leftovers config which I guess might cause some problems. I only found very basic lighttpd config though. The PHP code is very old, but to my knowledge it runs fine on PHP 7.
Cheers, Nux
Bryan Davis (2020-01-09 22:57):
I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the
basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and thecloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Hi!
I don't have much luck with a webservice based on the python3.7 image. It is running fine on the legacy K8s cluster.
On the new cluster I got a segfault. After stopping the webservice and trying again to get an empty log the pod is now stuck in ContainerCreating.
A few minutes ago: tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get pods NAME READY STATUS RESTARTS AGE flaggedrevspromotioncheck-7cbfff44fc-jnhmq 0/1 ContainerCreating 0 2m48s
...and just now: tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get pods NAME READY STATUS RESTARTS AGE flaggedrevspromotioncheck-7cbfff44fc-q55gm 0/1 ContainerCreating 0 5m18s
Best regards,
Count Count
On Thu, Jan 9, 2020 at 10:58 PM Bryan Davis bd808@wikimedia.org wrote:
I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Hi Count Count,
I'm afraid you seem to have no pods on the new cluster to look at:
# kubectl get -n tool-flaggedrevspromotioncheck pod No resources found.
Alex
On Sun, 12 Jan 2020 at 21:07, Count Count countvoncount123456@gmail.com wrote:
Hi!
I don't have much luck with a webservice based on the python3.7 image. It is running fine on the legacy K8s cluster.
On the new cluster I got a segfault. After stopping the webservice and trying again to get an empty log the pod is now stuck in ContainerCreating.
A few minutes ago: tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get pods NAME READY STATUS RESTARTS AGE flaggedrevspromotioncheck-7cbfff44fc-jnhmq 0/1 ContainerCreating 0 2m48s
...and just now: tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get pods NAME READY STATUS RESTARTS AGE flaggedrevspromotioncheck-7cbfff44fc-q55gm 0/1 ContainerCreating 0 5m18s
Best regards,
Count Count
On Thu, Jan 9, 2020 at 10:58 PM Bryan Davis bd808@wikimedia.org wrote:
I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Yes, I switched back to the old cluster. This is a new tool that was used in production even if only rarely. I can't leave it offline for hours.
I have created a test tool as a copy with which I can reproduce the issue: tools.countcounttest@tools-sgebastion-07:~$ kubectl get pods NAME READY STATUS RESTARTS AGE countcounttest-6b58f5c547-mf4jx 0/1 ContainerCreating 0 77s
I will leave that running. If the container gets created I might also be able to reproduce the segfault.
Best regards,
Count Count
On Sun, Jan 12, 2020 at 10:20 PM Alex Monk krenair@gmail.com wrote:
Hi Count Count,
I'm afraid you seem to have no pods on the new cluster to look at:
# kubectl get -n tool-flaggedrevspromotioncheck pod No resources found.
Alex
On Sun, 12 Jan 2020 at 21:07, Count Count countvoncount123456@gmail.com wrote:
Hi!
I don't have much luck with a webservice based on the python3.7 image. It is running fine on the legacy K8s cluster.
On the new cluster I got a segfault. After stopping the webservice and trying again to get an empty log the pod is now stuck in ContainerCreating.
A few minutes ago: tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get pods NAME READY STATUS RESTARTS AGE flaggedrevspromotioncheck-7cbfff44fc-jnhmq 0/1 ContainerCreating 0 2m48s
...and just now: tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get pods NAME READY STATUS RESTARTS AGE flaggedrevspromotioncheck-7cbfff44fc-q55gm 0/1 ContainerCreating 0 5m18s
Best regards,
Count Count
On Thu, Jan 9, 2020 at 10:58 PM Bryan Davis bd808@wikimedia.org wrote:
I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Thanks Count Count. I have identified a new issue with the new k8s cluster and am looking into it.
On Sun, 12 Jan 2020 at 21:43, Count Count countvoncount123456@gmail.com wrote:
Yes, I switched back to the old cluster. This is a new tool that was used in production even if only rarely. I can't leave it offline for hours.
I have created a test tool as a copy with which I can reproduce the issue: tools.countcounttest@tools-sgebastion-07:~$ kubectl get pods NAME READY STATUS RESTARTS AGE countcounttest-6b58f5c547-mf4jx 0/1 ContainerCreating 0 77s
I will leave that running. If the container gets created I might also be able to reproduce the segfault.
Best regards,
Count Count
On Sun, Jan 12, 2020 at 10:20 PM Alex Monk krenair@gmail.com wrote:
Hi Count Count,
I'm afraid you seem to have no pods on the new cluster to look at:
# kubectl get -n tool-flaggedrevspromotioncheck pod No resources found.
Alex
On Sun, 12 Jan 2020 at 21:07, Count Count countvoncount123456@gmail.com wrote:
Hi!
I don't have much luck with a webservice based on the python3.7 image. It is running fine on the legacy K8s cluster.
On the new cluster I got a segfault. After stopping the webservice and trying again to get an empty log the pod is now stuck in ContainerCreating.
A few minutes ago: tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get pods NAME READY STATUS RESTARTS AGE flaggedrevspromotioncheck-7cbfff44fc-jnhmq 0/1 ContainerCreating 0 2m48s
...and just now: tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get pods NAME READY STATUS RESTARTS AGE flaggedrevspromotioncheck-7cbfff44fc-q55gm 0/1 ContainerCreating 0 5m18s
Best regards,
Count Count
On Thu, Jan 9, 2020 at 10:58 PM Bryan Davis bd808@wikimedia.org wrote:
I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Hi Count Count, I believe I may have sorted out an issue that prevented some pods (depending partially on luck) from creating containers. Your pod started and container and it crashed, I see a uwsgi.log file with a python module problem and a uwsgi segfault.
On Sun, 12 Jan 2020 at 22:12, Alex Monk krenair@gmail.com wrote:
Thanks Count Count. I have identified a new issue with the new k8s cluster and am looking into it.
On Sun, 12 Jan 2020 at 21:43, Count Count countvoncount123456@gmail.com wrote:
Yes, I switched back to the old cluster. This is a new tool that was used in production even if only rarely. I can't leave it offline for hours.
I have created a test tool as a copy with which I can reproduce the issue: tools.countcounttest@tools-sgebastion-07:~$ kubectl get pods NAME READY STATUS RESTARTS AGE countcounttest-6b58f5c547-mf4jx 0/1 ContainerCreating 0 77s
I will leave that running. If the container gets created I might also be able to reproduce the segfault.
Best regards,
Count Count
On Sun, Jan 12, 2020 at 10:20 PM Alex Monk krenair@gmail.com wrote:
Hi Count Count,
I'm afraid you seem to have no pods on the new cluster to look at:
# kubectl get -n tool-flaggedrevspromotioncheck pod No resources found.
Alex
On Sun, 12 Jan 2020 at 21:07, Count Count countvoncount123456@gmail.com wrote:
Hi!
I don't have much luck with a webservice based on the python3.7 image. It is running fine on the legacy K8s cluster.
On the new cluster I got a segfault. After stopping the webservice and trying again to get an empty log the pod is now stuck in ContainerCreating.
A few minutes ago: tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get pods NAME READY STATUS RESTARTS AGE flaggedrevspromotioncheck-7cbfff44fc-jnhmq 0/1 ContainerCreating 0 2m48s
...and just now: tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get pods NAME READY STATUS RESTARTS AGE flaggedrevspromotioncheck-7cbfff44fc-q55gm 0/1 ContainerCreating 0 5m18s
Best regards,
Count Count
On Thu, Jan 9, 2020 at 10:58 PM Bryan Davis bd808@wikimedia.org wrote:
I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Your pod started and container and it crashed, I see a uwsgi.log file with a python module problem and a uwsgi segfault.
Yes. It was working fine with the legacy cluster. The service ist started via webservice --backend=kubernetes python3.7 start
Apparently it cannot load the uwsgi shared library if deployed on the new cluster? tools.countcounttest@tools-sgebastion-07:~$ kubectl logs countcounttest-6b58f5c547-785mr open("/usr/lib/uwsgi/plugins/python_plugin.so"): No such file or directory [core/utils.c line 3724] !!! UNABLE to load uWSGI plugin: /usr/lib/uwsgi/plugins/python_plugin.so: cannot open shared object file: No such file or directory !!!
On Sun, Jan 12, 2020 at 11:42 PM Alex Monk krenair@gmail.com wrote:
Hi Count Count, I believe I may have sorted out an issue that prevented some pods (depending partially on luck) from creating containers. Your pod started and container and it crashed, I see a uwsgi.log file with a python module problem and a uwsgi segfault.
On Sun, 12 Jan 2020 at 22:12, Alex Monk krenair@gmail.com wrote:
Thanks Count Count. I have identified a new issue with the new k8s cluster and am looking into it.
On Sun, 12 Jan 2020 at 21:43, Count Count countvoncount123456@gmail.com wrote:
Yes, I switched back to the old cluster. This is a new tool that was used in production even if only rarely. I can't leave it offline for hours.
I have created a test tool as a copy with which I can reproduce the issue: tools.countcounttest@tools-sgebastion-07:~$ kubectl get pods NAME READY STATUS RESTARTS AGE countcounttest-6b58f5c547-mf4jx 0/1 ContainerCreating 0 77s
I will leave that running. If the container gets created I might also be able to reproduce the segfault.
Best regards,
Count Count
On Sun, Jan 12, 2020 at 10:20 PM Alex Monk krenair@gmail.com wrote:
Hi Count Count,
I'm afraid you seem to have no pods on the new cluster to look at:
# kubectl get -n tool-flaggedrevspromotioncheck pod No resources found.
Alex
On Sun, 12 Jan 2020 at 21:07, Count Count < countvoncount123456@gmail.com> wrote:
Hi!
I don't have much luck with a webservice based on the python3.7 image. It is running fine on the legacy K8s cluster.
On the new cluster I got a segfault. After stopping the webservice and trying again to get an empty log the pod is now stuck in ContainerCreating.
A few minutes ago: tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get pods NAME READY STATUS RESTARTS AGE flaggedrevspromotioncheck-7cbfff44fc-jnhmq 0/1 ContainerCreating 0 2m48s
...and just now: tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get pods NAME READY STATUS RESTARTS AGE flaggedrevspromotioncheck-7cbfff44fc-q55gm 0/1 ContainerCreating 0 5m18s
Best regards,
Count Count
On Thu, Jan 9, 2020 at 10:58 PM Bryan Davis bd808@wikimedia.org wrote:
I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
I think I've seen that particular error that you see in stdout/stderr (via kubectl logs) before - on pods that in fact were working.
Meanwhile, uwsgi.log says:
Python version: 3.7.3 (default, Apr 3 2019, 05:39:12) [GCC 8.3.0] Set PythonHome to /data/project/countcounttest/www/python/venv Fatal Python error: initfsencoding: Unable to get the locale encoding ModuleNotFoundError: No module named 'encodings'
Current thread 0x00007fe50490e780 (most recent call first): !!! uWSGI process 1 got Segmentation Fault !!!
followed by a backtrace. Suggests the problem is related to something inside the image/application code rather than the cluster itself anyway. I notice the pod on the new cluster seems to be using the sssd variant of the toolforge-python37-web image, which pods in the old cluster are not using. I doubt it's the source problem as uwsgi shouldn't be segfaulting over some problem talking to LDAP... Needs further investigation by someone during the week I think.
On Sun, 12 Jan 2020 at 23:00, Count Count countvoncount123456@gmail.com wrote:
Your pod started and container and it crashed, I see a uwsgi.log file with
a python module problem and a uwsgi segfault.
Yes. It was working fine with the legacy cluster. The service ist started via webservice --backend=kubernetes python3.7 start
Apparently it cannot load the uwsgi shared library if deployed on the new cluster? tools.countcounttest@tools-sgebastion-07:~$ kubectl logs countcounttest-6b58f5c547-785mr open("/usr/lib/uwsgi/plugins/python_plugin.so"): No such file or directory [core/utils.c line 3724] !!! UNABLE to load uWSGI plugin: /usr/lib/uwsgi/plugins/python_plugin.so: cannot open shared object file: No such file or directory !!!
On Sun, Jan 12, 2020 at 11:42 PM Alex Monk krenair@gmail.com wrote:
Hi Count Count, I believe I may have sorted out an issue that prevented some pods (depending partially on luck) from creating containers. Your pod started and container and it crashed, I see a uwsgi.log file with a python module problem and a uwsgi segfault.
On Sun, 12 Jan 2020 at 22:12, Alex Monk krenair@gmail.com wrote:
Thanks Count Count. I have identified a new issue with the new k8s cluster and am looking into it.
On Sun, 12 Jan 2020 at 21:43, Count Count countvoncount123456@gmail.com wrote:
Yes, I switched back to the old cluster. This is a new tool that was used in production even if only rarely. I can't leave it offline for hours.
I have created a test tool as a copy with which I can reproduce the issue: tools.countcounttest@tools-sgebastion-07:~$ kubectl get pods NAME READY STATUS RESTARTS AGE countcounttest-6b58f5c547-mf4jx 0/1 ContainerCreating 0 77s
I will leave that running. If the container gets created I might also be able to reproduce the segfault.
Best regards,
Count Count
On Sun, Jan 12, 2020 at 10:20 PM Alex Monk krenair@gmail.com wrote:
Hi Count Count,
I'm afraid you seem to have no pods on the new cluster to look at:
# kubectl get -n tool-flaggedrevspromotioncheck pod No resources found.
Alex
On Sun, 12 Jan 2020 at 21:07, Count Count < countvoncount123456@gmail.com> wrote:
Hi!
I don't have much luck with a webservice based on the python3.7 image. It is running fine on the legacy K8s cluster.
On the new cluster I got a segfault. After stopping the webservice and trying again to get an empty log the pod is now stuck in ContainerCreating.
A few minutes ago: tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get pods NAME READY STATUS RESTARTS AGE flaggedrevspromotioncheck-7cbfff44fc-jnhmq 0/1 ContainerCreating 0 2m48s
...and just now: tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get pods NAME READY STATUS RESTARTS AGE flaggedrevspromotioncheck-7cbfff44fc-q55gm 0/1 ContainerCreating 0 5m18s
Best regards,
Count Count
On Thu, Jan 9, 2020 at 10:58 PM Bryan Davis bd808@wikimedia.org wrote:
> I am happy to announce that a new and improved Kubernetes cluster is > now available for use by beta testers on an opt-in basis. A page has > been created on Wikitech [0] outlining the self-service migration > process. > > Timeline: > * 2020-01-09: 2020 Kubernetes cluster available for beta testers on > an > opt-in basis > * 2020-01-23: 2020 Kubernetes cluster general availability for > migration on an opt-in basis > * 2020-02-10: Automatic migration of remaining workloads from 2016 > cluster to 2020 cluster by Toolforge admins > > This new cluster has been a work in progress for more than a year > within the Wikimedia Cloud Services team, and a top priority project > for the past six months. About 35 tools, including > https://tools.wmflabs.org/admin/, are currently running on what we > are > calling the "2020 Kubernetes cluster". This new cluster is running > Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer > authentication and authorization method (RBAC), a new ingress routing > service, and a different method of integrating with the Developer > account LDAP service. We have built a new tool [1] which makes the > state of the Kubernetes cluster more transparent and on par with the > information that we already expose for the grid engine cluster [2]. > > With a significant number of tools managed by Toolforge > administrators > already migrated to the new cluster, we are fairly confident that the > basic features used by most Kubernetes tools are covered. It is > likely > that a few outlying issues remain to be found as more tools move, but > we have confidence that we can address them quickly. This has led us > to propose a fairly short period of voluntary beta testing, followed > by a short general availability opt-in migration period, and finally > a > complete migration of all remaining tools which will be done by the > Toolforge administration team for anyone who has not migrated their > self. > > Please help with beta testing if you have some time and are willing > to > get help on irc, Phabricator, and the cloud@lists.wikimedia.org > mailing list for early adopter issues you may encounter. > > I want to publicly praise Brooke Storm and Arturo Borrero González > for > the hours that they have put into reading docs, building proof of > concept clusters, and improving automation and processes to make the > 2020 Kubernetes cluster possible. The Toolforge community can look > forward to more frequent and less disruptive software upgrades in > this > cluster as a direct result of this work. We have some other feature > improvements in planning now that I think you will all be excited to > see and use later this year! > > [0]: > https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration > [1]: https://tools.wmflabs.org/k8s-status/ > [2]: https://tools.wmflabs.org/sge-status/ > > Bryan (on behalf of the Toolforge admins and the Cloud Services team) > -- > Bryan Davis Technical Engagement Wikimedia > Foundation > Principal Software Engineer Boise, ID > USA > [[m:User:BDavis_(WMF)]] irc: > bd808 > > _______________________________________________ > Wikimedia Cloud Services announce mailing list > Cloud-announce@lists.wikimedia.org (formerly > labs-announce@lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/cloud-announce > _______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Maybe a venv created in a different python version?
Chico Venancio
Em dom, 12 de jan de 2020 20:14, Alex Monk krenair@gmail.com escreveu:
I think I've seen that particular error that you see in stdout/stderr (via kubectl logs) before - on pods that in fact were working.
Meanwhile, uwsgi.log says:
Python version: 3.7.3 (default, Apr 3 2019, 05:39:12) [GCC 8.3.0] Set PythonHome to /data/project/countcounttest/www/python/venv Fatal Python error: initfsencoding: Unable to get the locale encoding ModuleNotFoundError: No module named 'encodings'
Current thread 0x00007fe50490e780 (most recent call first): !!! uWSGI process 1 got Segmentation Fault !!!
followed by a backtrace. Suggests the problem is related to something inside the image/application code rather than the cluster itself anyway. I notice the pod on the new cluster seems to be using the sssd variant of the toolforge-python37-web image, which pods in the old cluster are not using. I doubt it's the source problem as uwsgi shouldn't be segfaulting over some problem talking to LDAP... Needs further investigation by someone during the week I think.
On Sun, 12 Jan 2020 at 23:00, Count Count countvoncount123456@gmail.com wrote:
Your pod started and container and it crashed, I see a uwsgi.log file
with a python module problem and a uwsgi segfault.
Yes. It was working fine with the legacy cluster. The service ist started via webservice --backend=kubernetes python3.7 start
Apparently it cannot load the uwsgi shared library if deployed on the new cluster? tools.countcounttest@tools-sgebastion-07:~$ kubectl logs countcounttest-6b58f5c547-785mr open("/usr/lib/uwsgi/plugins/python_plugin.so"): No such file or directory [core/utils.c line 3724] !!! UNABLE to load uWSGI plugin: /usr/lib/uwsgi/plugins/python_plugin.so: cannot open shared object file: No such file or directory !!!
On Sun, Jan 12, 2020 at 11:42 PM Alex Monk krenair@gmail.com wrote:
Hi Count Count, I believe I may have sorted out an issue that prevented some pods (depending partially on luck) from creating containers. Your pod started and container and it crashed, I see a uwsgi.log file with a python module problem and a uwsgi segfault.
On Sun, 12 Jan 2020 at 22:12, Alex Monk krenair@gmail.com wrote:
Thanks Count Count. I have identified a new issue with the new k8s cluster and am looking into it.
On Sun, 12 Jan 2020 at 21:43, Count Count < countvoncount123456@gmail.com> wrote:
Yes, I switched back to the old cluster. This is a new tool that was used in production even if only rarely. I can't leave it offline for hours.
I have created a test tool as a copy with which I can reproduce the issue: tools.countcounttest@tools-sgebastion-07:~$ kubectl get pods NAME READY STATUS RESTARTS AGE countcounttest-6b58f5c547-mf4jx 0/1 ContainerCreating 0 77s
I will leave that running. If the container gets created I might also be able to reproduce the segfault.
Best regards,
Count Count
On Sun, Jan 12, 2020 at 10:20 PM Alex Monk krenair@gmail.com wrote:
Hi Count Count,
I'm afraid you seem to have no pods on the new cluster to look at:
# kubectl get -n tool-flaggedrevspromotioncheck pod No resources found.
Alex
On Sun, 12 Jan 2020 at 21:07, Count Count < countvoncount123456@gmail.com> wrote:
> Hi! > > I don't have much luck with a webservice based on the python3.7 > image. It is running fine on the legacy K8s cluster. > > On the new cluster I got a segfault. After stopping the webservice > and trying again to get an empty log the pod is now stuck in > ContainerCreating. > > A few minutes ago: > tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get > pods > NAME READY STATUS > RESTARTS AGE > flaggedrevspromotioncheck-7cbfff44fc-jnhmq 0/1 > ContainerCreating 0 2m48s > > ...and just now: > tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get > pods > NAME READY STATUS > RESTARTS AGE > flaggedrevspromotioncheck-7cbfff44fc-q55gm 0/1 > ContainerCreating 0 5m18s > > Best regards, > > Count Count > > On Thu, Jan 9, 2020 at 10:58 PM Bryan Davis bd808@wikimedia.org > wrote: > >> I am happy to announce that a new and improved Kubernetes cluster is >> now available for use by beta testers on an opt-in basis. A page has >> been created on Wikitech [0] outlining the self-service migration >> process. >> >> Timeline: >> * 2020-01-09: 2020 Kubernetes cluster available for beta testers on >> an >> opt-in basis >> * 2020-01-23: 2020 Kubernetes cluster general availability for >> migration on an opt-in basis >> * 2020-02-10: Automatic migration of remaining workloads from 2016 >> cluster to 2020 cluster by Toolforge admins >> >> This new cluster has been a work in progress for more than a year >> within the Wikimedia Cloud Services team, and a top priority project >> for the past six months. About 35 tools, including >> https://tools.wmflabs.org/admin/, are currently running on what we >> are >> calling the "2020 Kubernetes cluster". This new cluster is running >> Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer >> authentication and authorization method (RBAC), a new ingress >> routing >> service, and a different method of integrating with the Developer >> account LDAP service. We have built a new tool [1] which makes the >> state of the Kubernetes cluster more transparent and on par with the >> information that we already expose for the grid engine cluster [2]. >> >> With a significant number of tools managed by Toolforge >> administrators >> already migrated to the new cluster, we are fairly confident that >> the >> basic features used by most Kubernetes tools are covered. It is >> likely >> that a few outlying issues remain to be found as more tools move, >> but >> we have confidence that we can address them quickly. This has led us >> to propose a fairly short period of voluntary beta testing, followed >> by a short general availability opt-in migration period, and >> finally a >> complete migration of all remaining tools which will be done by the >> Toolforge administration team for anyone who has not migrated their >> self. >> >> Please help with beta testing if you have some time and are willing >> to >> get help on irc, Phabricator, and the cloud@lists.wikimedia.org >> mailing list for early adopter issues you may encounter. >> >> I want to publicly praise Brooke Storm and Arturo Borrero González >> for >> the hours that they have put into reading docs, building proof of >> concept clusters, and improving automation and processes to make the >> 2020 Kubernetes cluster possible. The Toolforge community can look >> forward to more frequent and less disruptive software upgrades in >> this >> cluster as a direct result of this work. We have some other feature >> improvements in planning now that I think you will all be excited to >> see and use later this year! >> >> [0]: >> https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration >> [1]: https://tools.wmflabs.org/k8s-status/ >> [2]: https://tools.wmflabs.org/sge-status/ >> >> Bryan (on behalf of the Toolforge admins and the Cloud Services >> team) >> -- >> Bryan Davis Technical Engagement Wikimedia >> Foundation >> Principal Software Engineer Boise, ID >> USA >> [[m:User:BDavis_(WMF)]] irc: >> bd808 >> >> _______________________________________________ >> Wikimedia Cloud Services announce mailing list >> Cloud-announce@lists.wikimedia.org (formerly >> labs-announce@lists.wikimedia.org) >> https://lists.wikimedia.org/mailman/listinfo/cloud-announce >> > _______________________________________________ > Wikimedia Cloud Services mailing list > Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Maybe a venv created in a different python version?
Hmm, I am using a venv with Python 3.7.6. I can try with 3.7.3 tomorrow, which is used in the image.
BTW: No version of Python 3.7 is installed on the dev/bastion hosts afaics. Might be a good idea to sync the version to the one used in the images?
On Mon, Jan 13, 2020 at 12:19 AM Chico Venancio chicocvenancio@gmail.com wrote:
Maybe a venv created in a different python version?
Chico Venancio
Em dom, 12 de jan de 2020 20:14, Alex Monk krenair@gmail.com escreveu:
I think I've seen that particular error that you see in stdout/stderr (via kubectl logs) before - on pods that in fact were working.
Meanwhile, uwsgi.log says:
Python version: 3.7.3 (default, Apr 3 2019, 05:39:12) [GCC 8.3.0] Set PythonHome to /data/project/countcounttest/www/python/venv Fatal Python error: initfsencoding: Unable to get the locale encoding ModuleNotFoundError: No module named 'encodings'
Current thread 0x00007fe50490e780 (most recent call first): !!! uWSGI process 1 got Segmentation Fault !!!
followed by a backtrace. Suggests the problem is related to something inside the image/application code rather than the cluster itself anyway. I notice the pod on the new cluster seems to be using the sssd variant of the toolforge-python37-web image, which pods in the old cluster are not using. I doubt it's the source problem as uwsgi shouldn't be segfaulting over some problem talking to LDAP... Needs further investigation by someone during the week I think.
On Sun, 12 Jan 2020 at 23:00, Count Count countvoncount123456@gmail.com wrote:
Your pod started and container and it crashed, I see a uwsgi.log file
with a python module problem and a uwsgi segfault.
Yes. It was working fine with the legacy cluster. The service ist started via webservice --backend=kubernetes python3.7 start
Apparently it cannot load the uwsgi shared library if deployed on the new cluster? tools.countcounttest@tools-sgebastion-07:~$ kubectl logs countcounttest-6b58f5c547-785mr open("/usr/lib/uwsgi/plugins/python_plugin.so"): No such file or directory [core/utils.c line 3724] !!! UNABLE to load uWSGI plugin: /usr/lib/uwsgi/plugins/python_plugin.so: cannot open shared object file: No such file or directory !!!
On Sun, Jan 12, 2020 at 11:42 PM Alex Monk krenair@gmail.com wrote:
Hi Count Count, I believe I may have sorted out an issue that prevented some pods (depending partially on luck) from creating containers. Your pod started and container and it crashed, I see a uwsgi.log file with a python module problem and a uwsgi segfault.
On Sun, 12 Jan 2020 at 22:12, Alex Monk krenair@gmail.com wrote:
Thanks Count Count. I have identified a new issue with the new k8s cluster and am looking into it.
On Sun, 12 Jan 2020 at 21:43, Count Count < countvoncount123456@gmail.com> wrote:
Yes, I switched back to the old cluster. This is a new tool that was used in production even if only rarely. I can't leave it offline for hours.
I have created a test tool as a copy with which I can reproduce the issue: tools.countcounttest@tools-sgebastion-07:~$ kubectl get pods NAME READY STATUS RESTARTS AGE countcounttest-6b58f5c547-mf4jx 0/1 ContainerCreating 0 77s
I will leave that running. If the container gets created I might also be able to reproduce the segfault.
Best regards,
Count Count
On Sun, Jan 12, 2020 at 10:20 PM Alex Monk krenair@gmail.com wrote:
> Hi Count Count, > > I'm afraid you seem to have no pods on the new cluster to look at: > > # kubectl get -n tool-flaggedrevspromotioncheck pod > No resources found. > > Alex > > On Sun, 12 Jan 2020 at 21:07, Count Count < > countvoncount123456@gmail.com> wrote: > >> Hi! >> >> I don't have much luck with a webservice based on the python3.7 >> image. It is running fine on the legacy K8s cluster. >> >> On the new cluster I got a segfault. After stopping the webservice >> and trying again to get an empty log the pod is now stuck in >> ContainerCreating. >> >> A few minutes ago: >> tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get >> pods >> NAME READY STATUS >> RESTARTS AGE >> flaggedrevspromotioncheck-7cbfff44fc-jnhmq 0/1 >> ContainerCreating 0 2m48s >> >> ...and just now: >> tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get >> pods >> NAME READY STATUS >> RESTARTS AGE >> flaggedrevspromotioncheck-7cbfff44fc-q55gm 0/1 >> ContainerCreating 0 5m18s >> >> Best regards, >> >> Count Count >> >> On Thu, Jan 9, 2020 at 10:58 PM Bryan Davis bd808@wikimedia.org >> wrote: >> >>> I am happy to announce that a new and improved Kubernetes cluster >>> is >>> now available for use by beta testers on an opt-in basis. A page >>> has >>> been created on Wikitech [0] outlining the self-service migration >>> process. >>> >>> Timeline: >>> * 2020-01-09: 2020 Kubernetes cluster available for beta testers >>> on an >>> opt-in basis >>> * 2020-01-23: 2020 Kubernetes cluster general availability for >>> migration on an opt-in basis >>> * 2020-02-10: Automatic migration of remaining workloads from 2016 >>> cluster to 2020 cluster by Toolforge admins >>> >>> This new cluster has been a work in progress for more than a year >>> within the Wikimedia Cloud Services team, and a top priority >>> project >>> for the past six months. About 35 tools, including >>> https://tools.wmflabs.org/admin/, are currently running on what >>> we are >>> calling the "2020 Kubernetes cluster". This new cluster is running >>> Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer >>> authentication and authorization method (RBAC), a new ingress >>> routing >>> service, and a different method of integrating with the Developer >>> account LDAP service. We have built a new tool [1] which makes the >>> state of the Kubernetes cluster more transparent and on par with >>> the >>> information that we already expose for the grid engine cluster [2]. >>> >>> With a significant number of tools managed by Toolforge >>> administrators >>> already migrated to the new cluster, we are fairly confident that >>> the >>> basic features used by most Kubernetes tools are covered. It is >>> likely >>> that a few outlying issues remain to be found as more tools move, >>> but >>> we have confidence that we can address them quickly. This has led >>> us >>> to propose a fairly short period of voluntary beta testing, >>> followed >>> by a short general availability opt-in migration period, and >>> finally a >>> complete migration of all remaining tools which will be done by the >>> Toolforge administration team for anyone who has not migrated their >>> self. >>> >>> Please help with beta testing if you have some time and are >>> willing to >>> get help on irc, Phabricator, and the cloud@lists.wikimedia.org >>> mailing list for early adopter issues you may encounter. >>> >>> I want to publicly praise Brooke Storm and Arturo Borrero González >>> for >>> the hours that they have put into reading docs, building proof of >>> concept clusters, and improving automation and processes to make >>> the >>> 2020 Kubernetes cluster possible. The Toolforge community can look >>> forward to more frequent and less disruptive software upgrades in >>> this >>> cluster as a direct result of this work. We have some other feature >>> improvements in planning now that I think you will all be excited >>> to >>> see and use later this year! >>> >>> [0]: >>> https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration >>> [1]: https://tools.wmflabs.org/k8s-status/ >>> [2]: https://tools.wmflabs.org/sge-status/ >>> >>> Bryan (on behalf of the Toolforge admins and the Cloud Services >>> team) >>> -- >>> Bryan Davis Technical Engagement Wikimedia >>> Foundation >>> Principal Software Engineer Boise, >>> ID USA >>> [[m:User:BDavis_(WMF)]] irc: >>> bd808 >>> >>> _______________________________________________ >>> Wikimedia Cloud Services announce mailing list >>> Cloud-announce@lists.wikimedia.org (formerly >>> labs-announce@lists.wikimedia.org) >>> https://lists.wikimedia.org/mailman/listinfo/cloud-announce >>> >> _______________________________________________ >> Wikimedia Cloud Services mailing list >> Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) >> https://lists.wikimedia.org/mailman/listinfo/cloud > > _______________________________________________ > Wikimedia Cloud Services mailing list > Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Interesting, uwsgi had Python 3.7.3 but `./www/python/venv/bin/python --version` says 3.7.6. Is that a big enough difference to cause problems?
On Sun, 12 Jan 2020 at 23:19, Chico Venancio chicocvenancio@gmail.com wrote:
Maybe a venv created in a different python version?
Chico Venancio
Em dom, 12 de jan de 2020 20:14, Alex Monk krenair@gmail.com escreveu:
I think I've seen that particular error that you see in stdout/stderr (via kubectl logs) before - on pods that in fact were working.
Meanwhile, uwsgi.log says:
Python version: 3.7.3 (default, Apr 3 2019, 05:39:12) [GCC 8.3.0] Set PythonHome to /data/project/countcounttest/www/python/venv Fatal Python error: initfsencoding: Unable to get the locale encoding ModuleNotFoundError: No module named 'encodings'
Current thread 0x00007fe50490e780 (most recent call first): !!! uWSGI process 1 got Segmentation Fault !!!
followed by a backtrace. Suggests the problem is related to something inside the image/application code rather than the cluster itself anyway. I notice the pod on the new cluster seems to be using the sssd variant of the toolforge-python37-web image, which pods in the old cluster are not using. I doubt it's the source problem as uwsgi shouldn't be segfaulting over some problem talking to LDAP... Needs further investigation by someone during the week I think.
On Sun, 12 Jan 2020 at 23:00, Count Count countvoncount123456@gmail.com wrote:
Your pod started and container and it crashed, I see a uwsgi.log file
with a python module problem and a uwsgi segfault.
Yes. It was working fine with the legacy cluster. The service ist started via webservice --backend=kubernetes python3.7 start
Apparently it cannot load the uwsgi shared library if deployed on the new cluster? tools.countcounttest@tools-sgebastion-07:~$ kubectl logs countcounttest-6b58f5c547-785mr open("/usr/lib/uwsgi/plugins/python_plugin.so"): No such file or directory [core/utils.c line 3724] !!! UNABLE to load uWSGI plugin: /usr/lib/uwsgi/plugins/python_plugin.so: cannot open shared object file: No such file or directory !!!
On Sun, Jan 12, 2020 at 11:42 PM Alex Monk krenair@gmail.com wrote:
Hi Count Count, I believe I may have sorted out an issue that prevented some pods (depending partially on luck) from creating containers. Your pod started and container and it crashed, I see a uwsgi.log file with a python module problem and a uwsgi segfault.
On Sun, 12 Jan 2020 at 22:12, Alex Monk krenair@gmail.com wrote:
Thanks Count Count. I have identified a new issue with the new k8s cluster and am looking into it.
On Sun, 12 Jan 2020 at 21:43, Count Count < countvoncount123456@gmail.com> wrote:
Yes, I switched back to the old cluster. This is a new tool that was used in production even if only rarely. I can't leave it offline for hours.
I have created a test tool as a copy with which I can reproduce the issue: tools.countcounttest@tools-sgebastion-07:~$ kubectl get pods NAME READY STATUS RESTARTS AGE countcounttest-6b58f5c547-mf4jx 0/1 ContainerCreating 0 77s
I will leave that running. If the container gets created I might also be able to reproduce the segfault.
Best regards,
Count Count
On Sun, Jan 12, 2020 at 10:20 PM Alex Monk krenair@gmail.com wrote:
> Hi Count Count, > > I'm afraid you seem to have no pods on the new cluster to look at: > > # kubectl get -n tool-flaggedrevspromotioncheck pod > No resources found. > > Alex > > On Sun, 12 Jan 2020 at 21:07, Count Count < > countvoncount123456@gmail.com> wrote: > >> Hi! >> >> I don't have much luck with a webservice based on the python3.7 >> image. It is running fine on the legacy K8s cluster. >> >> On the new cluster I got a segfault. After stopping the webservice >> and trying again to get an empty log the pod is now stuck in >> ContainerCreating. >> >> A few minutes ago: >> tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get >> pods >> NAME READY STATUS >> RESTARTS AGE >> flaggedrevspromotioncheck-7cbfff44fc-jnhmq 0/1 >> ContainerCreating 0 2m48s >> >> ...and just now: >> tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get >> pods >> NAME READY STATUS >> RESTARTS AGE >> flaggedrevspromotioncheck-7cbfff44fc-q55gm 0/1 >> ContainerCreating 0 5m18s >> >> Best regards, >> >> Count Count >> >> On Thu, Jan 9, 2020 at 10:58 PM Bryan Davis bd808@wikimedia.org >> wrote: >> >>> I am happy to announce that a new and improved Kubernetes cluster >>> is >>> now available for use by beta testers on an opt-in basis. A page >>> has >>> been created on Wikitech [0] outlining the self-service migration >>> process. >>> >>> Timeline: >>> * 2020-01-09: 2020 Kubernetes cluster available for beta testers >>> on an >>> opt-in basis >>> * 2020-01-23: 2020 Kubernetes cluster general availability for >>> migration on an opt-in basis >>> * 2020-02-10: Automatic migration of remaining workloads from 2016 >>> cluster to 2020 cluster by Toolforge admins >>> >>> This new cluster has been a work in progress for more than a year >>> within the Wikimedia Cloud Services team, and a top priority >>> project >>> for the past six months. About 35 tools, including >>> https://tools.wmflabs.org/admin/, are currently running on what >>> we are >>> calling the "2020 Kubernetes cluster". This new cluster is running >>> Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer >>> authentication and authorization method (RBAC), a new ingress >>> routing >>> service, and a different method of integrating with the Developer >>> account LDAP service. We have built a new tool [1] which makes the >>> state of the Kubernetes cluster more transparent and on par with >>> the >>> information that we already expose for the grid engine cluster [2]. >>> >>> With a significant number of tools managed by Toolforge >>> administrators >>> already migrated to the new cluster, we are fairly confident that >>> the >>> basic features used by most Kubernetes tools are covered. It is >>> likely >>> that a few outlying issues remain to be found as more tools move, >>> but >>> we have confidence that we can address them quickly. This has led >>> us >>> to propose a fairly short period of voluntary beta testing, >>> followed >>> by a short general availability opt-in migration period, and >>> finally a >>> complete migration of all remaining tools which will be done by the >>> Toolforge administration team for anyone who has not migrated their >>> self. >>> >>> Please help with beta testing if you have some time and are >>> willing to >>> get help on irc, Phabricator, and the cloud@lists.wikimedia.org >>> mailing list for early adopter issues you may encounter. >>> >>> I want to publicly praise Brooke Storm and Arturo Borrero González >>> for >>> the hours that they have put into reading docs, building proof of >>> concept clusters, and improving automation and processes to make >>> the >>> 2020 Kubernetes cluster possible. The Toolforge community can look >>> forward to more frequent and less disruptive software upgrades in >>> this >>> cluster as a direct result of this work. We have some other feature >>> improvements in planning now that I think you will all be excited >>> to >>> see and use later this year! >>> >>> [0]: >>> https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration >>> [1]: https://tools.wmflabs.org/k8s-status/ >>> [2]: https://tools.wmflabs.org/sge-status/ >>> >>> Bryan (on behalf of the Toolforge admins and the Cloud Services >>> team) >>> -- >>> Bryan Davis Technical Engagement Wikimedia >>> Foundation >>> Principal Software Engineer Boise, >>> ID USA >>> [[m:User:BDavis_(WMF)]] irc: >>> bd808 >>> >>> _______________________________________________ >>> Wikimedia Cloud Services announce mailing list >>> Cloud-announce@lists.wikimedia.org (formerly >>> labs-announce@lists.wikimedia.org) >>> https://lists.wikimedia.org/mailman/listinfo/cloud-announce >>> >> _______________________________________________ >> Wikimedia Cloud Services mailing list >> Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) >> https://lists.wikimedia.org/mailman/listinfo/cloud > > _______________________________________________ > Wikimedia Cloud Services mailing list > Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
You have to create the venv in a container using 'webservice shell of the right runtime'. We support Python versions from Debian Jessie, Stretch and Buster by building in containers, so we cannot sync more than one of those to the bastion. We have moved a lot of Python tools back and forth without issue, but we have rebuilt the container image, which can introduce issues. Tomorrow, I will try a few things to be sure, but a venv can easily cause problems.
On Sun, Jan 12, 2020, 16:30 Alex Monk krenair@gmail.com wrote:
Interesting, uwsgi had Python 3.7.3 but `./www/python/venv/bin/python --version` says 3.7.6. Is that a big enough difference to cause problems?
On Sun, 12 Jan 2020 at 23:19, Chico Venancio chicocvenancio@gmail.com wrote:
Maybe a venv created in a different python version?
Chico Venancio
Em dom, 12 de jan de 2020 20:14, Alex Monk krenair@gmail.com escreveu:
I think I've seen that particular error that you see in stdout/stderr (via kubectl logs) before - on pods that in fact were working.
Meanwhile, uwsgi.log says:
Python version: 3.7.3 (default, Apr 3 2019, 05:39:12) [GCC 8.3.0] Set PythonHome to /data/project/countcounttest/www/python/venv Fatal Python error: initfsencoding: Unable to get the locale encoding ModuleNotFoundError: No module named 'encodings'
Current thread 0x00007fe50490e780 (most recent call first): !!! uWSGI process 1 got Segmentation Fault !!!
followed by a backtrace. Suggests the problem is related to something inside the image/application code rather than the cluster itself anyway. I notice the pod on the new cluster seems to be using the sssd variant of the toolforge-python37-web image, which pods in the old cluster are not using. I doubt it's the source problem as uwsgi shouldn't be segfaulting over some problem talking to LDAP... Needs further investigation by someone during the week I think.
On Sun, 12 Jan 2020 at 23:00, Count Count countvoncount123456@gmail.com wrote:
Your pod started and container and it crashed, I see a uwsgi.log file
with a python module problem and a uwsgi segfault.
Yes. It was working fine with the legacy cluster. The service ist started via webservice --backend=kubernetes python3.7 start
Apparently it cannot load the uwsgi shared library if deployed on the new cluster? tools.countcounttest@tools-sgebastion-07:~$ kubectl logs countcounttest-6b58f5c547-785mr open("/usr/lib/uwsgi/plugins/python_plugin.so"): No such file or directory [core/utils.c line 3724] !!! UNABLE to load uWSGI plugin: /usr/lib/uwsgi/plugins/python_plugin.so: cannot open shared object file: No such file or directory !!!
On Sun, Jan 12, 2020 at 11:42 PM Alex Monk krenair@gmail.com wrote:
Hi Count Count, I believe I may have sorted out an issue that prevented some pods (depending partially on luck) from creating containers. Your pod started and container and it crashed, I see a uwsgi.log file with a python module problem and a uwsgi segfault.
On Sun, 12 Jan 2020 at 22:12, Alex Monk krenair@gmail.com wrote:
Thanks Count Count. I have identified a new issue with the new k8s cluster and am looking into it.
On Sun, 12 Jan 2020 at 21:43, Count Count < countvoncount123456@gmail.com> wrote:
> Yes, I switched back to the old cluster. This is a new tool that was > used in production even if only rarely. I can't leave it offline for hours. > > I have created a test tool as a copy with which I can reproduce the > issue: > tools.countcounttest@tools-sgebastion-07:~$ kubectl get pods > NAME READY STATUS > RESTARTS AGE > countcounttest-6b58f5c547-mf4jx 0/1 ContainerCreating 0 > 77s > > I will leave that running. If the container gets created I might > also be able to reproduce the segfault. > > Best regards, > > Count Count > > On Sun, Jan 12, 2020 at 10:20 PM Alex Monk krenair@gmail.com > wrote: > >> Hi Count Count, >> >> I'm afraid you seem to have no pods on the new cluster to look at: >> >> # kubectl get -n tool-flaggedrevspromotioncheck pod >> No resources found. >> >> Alex >> >> On Sun, 12 Jan 2020 at 21:07, Count Count < >> countvoncount123456@gmail.com> wrote: >> >>> Hi! >>> >>> I don't have much luck with a webservice based on the python3.7 >>> image. It is running fine on the legacy K8s cluster. >>> >>> On the new cluster I got a segfault. After stopping the webservice >>> and trying again to get an empty log the pod is now stuck in >>> ContainerCreating. >>> >>> A few minutes ago: >>> tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl >>> get pods >>> NAME READY STATUS >>> RESTARTS AGE >>> flaggedrevspromotioncheck-7cbfff44fc-jnhmq 0/1 >>> ContainerCreating 0 2m48s >>> >>> ...and just now: >>> tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl >>> get pods >>> NAME READY STATUS >>> RESTARTS AGE >>> flaggedrevspromotioncheck-7cbfff44fc-q55gm 0/1 >>> ContainerCreating 0 5m18s >>> >>> Best regards, >>> >>> Count Count >>> >>> On Thu, Jan 9, 2020 at 10:58 PM Bryan Davis bd808@wikimedia.org >>> wrote: >>> >>>> I am happy to announce that a new and improved Kubernetes cluster >>>> is >>>> now available for use by beta testers on an opt-in basis. A page >>>> has >>>> been created on Wikitech [0] outlining the self-service migration >>>> process. >>>> >>>> Timeline: >>>> * 2020-01-09: 2020 Kubernetes cluster available for beta testers >>>> on an >>>> opt-in basis >>>> * 2020-01-23: 2020 Kubernetes cluster general availability for >>>> migration on an opt-in basis >>>> * 2020-02-10: Automatic migration of remaining workloads from 2016 >>>> cluster to 2020 cluster by Toolforge admins >>>> >>>> This new cluster has been a work in progress for more than a year >>>> within the Wikimedia Cloud Services team, and a top priority >>>> project >>>> for the past six months. About 35 tools, including >>>> https://tools.wmflabs.org/admin/, are currently running on what >>>> we are >>>> calling the "2020 Kubernetes cluster". This new cluster is running >>>> Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer >>>> authentication and authorization method (RBAC), a new ingress >>>> routing >>>> service, and a different method of integrating with the Developer >>>> account LDAP service. We have built a new tool [1] which makes the >>>> state of the Kubernetes cluster more transparent and on par with >>>> the >>>> information that we already expose for the grid engine cluster >>>> [2]. >>>> >>>> With a significant number of tools managed by Toolforge >>>> administrators >>>> already migrated to the new cluster, we are fairly confident that >>>> the >>>> basic features used by most Kubernetes tools are covered. It is >>>> likely >>>> that a few outlying issues remain to be found as more tools move, >>>> but >>>> we have confidence that we can address them quickly. This has led >>>> us >>>> to propose a fairly short period of voluntary beta testing, >>>> followed >>>> by a short general availability opt-in migration period, and >>>> finally a >>>> complete migration of all remaining tools which will be done by >>>> the >>>> Toolforge administration team for anyone who has not migrated >>>> their >>>> self. >>>> >>>> Please help with beta testing if you have some time and are >>>> willing to >>>> get help on irc, Phabricator, and the cloud@lists.wikimedia.org >>>> mailing list for early adopter issues you may encounter. >>>> >>>> I want to publicly praise Brooke Storm and Arturo Borrero >>>> González for >>>> the hours that they have put into reading docs, building proof of >>>> concept clusters, and improving automation and processes to make >>>> the >>>> 2020 Kubernetes cluster possible. The Toolforge community can look >>>> forward to more frequent and less disruptive software upgrades in >>>> this >>>> cluster as a direct result of this work. We have some other >>>> feature >>>> improvements in planning now that I think you will all be excited >>>> to >>>> see and use later this year! >>>> >>>> [0]: >>>> https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration >>>> [1]: https://tools.wmflabs.org/k8s-status/ >>>> [2]: https://tools.wmflabs.org/sge-status/ >>>> >>>> Bryan (on behalf of the Toolforge admins and the Cloud Services >>>> team) >>>> -- >>>> Bryan Davis Technical Engagement Wikimedia >>>> Foundation >>>> Principal Software Engineer Boise, >>>> ID USA >>>> [[m:User:BDavis_(WMF)]] irc: >>>> bd808 >>>> >>>> _______________________________________________ >>>> Wikimedia Cloud Services announce mailing list >>>> Cloud-announce@lists.wikimedia.org (formerly >>>> labs-announce@lists.wikimedia.org) >>>> https://lists.wikimedia.org/mailman/listinfo/cloud-announce >>>> >>> _______________________________________________ >>> Wikimedia Cloud Services mailing list >>> Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) >>> https://lists.wikimedia.org/mailman/listinfo/cloud >> >> _______________________________________________ >> Wikimedia Cloud Services mailing list >> Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) >> https://lists.wikimedia.org/mailman/listinfo/cloud > > _______________________________________________ > Wikimedia Cloud Services mailing list > Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Thanks, creating the venv in a container did the trick. May I suggest adding virtualenv to the Python 3.7 image? python -m venv works, but virtualenv is more commonly used.
On Mon, Jan 13, 2020 at 2:09 AM Brooke Storm bstorm@wikimedia.org wrote:
You have to create the venv in a container using 'webservice shell of the right runtime'. We support Python versions from Debian Jessie, Stretch and Buster by building in containers, so we cannot sync more than one of those to the bastion. We have moved a lot of Python tools back and forth without issue, but we have rebuilt the container image, which can introduce issues. Tomorrow, I will try a few things to be sure, but a venv can easily cause problems.
On Sun, Jan 12, 2020, 16:30 Alex Monk krenair@gmail.com wrote:
Interesting, uwsgi had Python 3.7.3 but `./www/python/venv/bin/python --version` says 3.7.6. Is that a big enough difference to cause problems?
On Sun, 12 Jan 2020 at 23:19, Chico Venancio chicocvenancio@gmail.com wrote:
Maybe a venv created in a different python version?
Chico Venancio
Em dom, 12 de jan de 2020 20:14, Alex Monk krenair@gmail.com escreveu:
I think I've seen that particular error that you see in stdout/stderr (via kubectl logs) before - on pods that in fact were working.
Meanwhile, uwsgi.log says:
Python version: 3.7.3 (default, Apr 3 2019, 05:39:12) [GCC 8.3.0] Set PythonHome to /data/project/countcounttest/www/python/venv Fatal Python error: initfsencoding: Unable to get the locale encoding ModuleNotFoundError: No module named 'encodings'
Current thread 0x00007fe50490e780 (most recent call first): !!! uWSGI process 1 got Segmentation Fault !!!
followed by a backtrace. Suggests the problem is related to something inside the image/application code rather than the cluster itself anyway. I notice the pod on the new cluster seems to be using the sssd variant of the toolforge-python37-web image, which pods in the old cluster are not using. I doubt it's the source problem as uwsgi shouldn't be segfaulting over some problem talking to LDAP... Needs further investigation by someone during the week I think.
On Sun, 12 Jan 2020 at 23:00, Count Count < countvoncount123456@gmail.com> wrote:
Your pod started and container and it crashed, I see a uwsgi.log file
with a python module problem and a uwsgi segfault.
Yes. It was working fine with the legacy cluster. The service ist started via webservice --backend=kubernetes python3.7 start
Apparently it cannot load the uwsgi shared library if deployed on the new cluster? tools.countcounttest@tools-sgebastion-07:~$ kubectl logs countcounttest-6b58f5c547-785mr open("/usr/lib/uwsgi/plugins/python_plugin.so"): No such file or directory [core/utils.c line 3724] !!! UNABLE to load uWSGI plugin: /usr/lib/uwsgi/plugins/python_plugin.so: cannot open shared object file: No such file or directory !!!
On Sun, Jan 12, 2020 at 11:42 PM Alex Monk krenair@gmail.com wrote:
Hi Count Count, I believe I may have sorted out an issue that prevented some pods (depending partially on luck) from creating containers. Your pod started and container and it crashed, I see a uwsgi.log file with a python module problem and a uwsgi segfault.
On Sun, 12 Jan 2020 at 22:12, Alex Monk krenair@gmail.com wrote:
> Thanks Count Count. I have identified a new issue with the new k8s > cluster and am looking into it. > > On Sun, 12 Jan 2020 at 21:43, Count Count < > countvoncount123456@gmail.com> wrote: > >> Yes, I switched back to the old cluster. This is a new tool that >> was used in production even if only rarely. I can't leave it offline for >> hours. >> >> I have created a test tool as a copy with which I can reproduce the >> issue: >> tools.countcounttest@tools-sgebastion-07:~$ kubectl get pods >> NAME READY STATUS >> RESTARTS AGE >> countcounttest-6b58f5c547-mf4jx 0/1 ContainerCreating 0 >> 77s >> >> I will leave that running. If the container gets created I might >> also be able to reproduce the segfault. >> >> Best regards, >> >> Count Count >> >> On Sun, Jan 12, 2020 at 10:20 PM Alex Monk krenair@gmail.com >> wrote: >> >>> Hi Count Count, >>> >>> I'm afraid you seem to have no pods on the new cluster to look at: >>> >>> # kubectl get -n tool-flaggedrevspromotioncheck pod >>> No resources found. >>> >>> Alex >>> >>> On Sun, 12 Jan 2020 at 21:07, Count Count < >>> countvoncount123456@gmail.com> wrote: >>> >>>> Hi! >>>> >>>> I don't have much luck with a webservice based on the python3.7 >>>> image. It is running fine on the legacy K8s cluster. >>>> >>>> On the new cluster I got a segfault. After stopping the >>>> webservice and trying again to get an empty log the pod is now stuck in >>>> ContainerCreating. >>>> >>>> A few minutes ago: >>>> tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl >>>> get pods >>>> NAME READY STATUS >>>> RESTARTS AGE >>>> flaggedrevspromotioncheck-7cbfff44fc-jnhmq 0/1 >>>> ContainerCreating 0 2m48s >>>> >>>> ...and just now: >>>> tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl >>>> get pods >>>> NAME READY STATUS >>>> RESTARTS AGE >>>> flaggedrevspromotioncheck-7cbfff44fc-q55gm 0/1 >>>> ContainerCreating 0 5m18s >>>> >>>> Best regards, >>>> >>>> Count Count >>>> >>>> On Thu, Jan 9, 2020 at 10:58 PM Bryan Davis bd808@wikimedia.org >>>> wrote: >>>> >>>>> I am happy to announce that a new and improved Kubernetes >>>>> cluster is >>>>> now available for use by beta testers on an opt-in basis. A page >>>>> has >>>>> been created on Wikitech [0] outlining the self-service migration >>>>> process. >>>>> >>>>> Timeline: >>>>> * 2020-01-09: 2020 Kubernetes cluster available for beta testers >>>>> on an >>>>> opt-in basis >>>>> * 2020-01-23: 2020 Kubernetes cluster general availability for >>>>> migration on an opt-in basis >>>>> * 2020-02-10: Automatic migration of remaining workloads from >>>>> 2016 >>>>> cluster to 2020 cluster by Toolforge admins >>>>> >>>>> This new cluster has been a work in progress for more than a year >>>>> within the Wikimedia Cloud Services team, and a top priority >>>>> project >>>>> for the past six months. About 35 tools, including >>>>> https://tools.wmflabs.org/admin/, are currently running on what >>>>> we are >>>>> calling the "2020 Kubernetes cluster". This new cluster is >>>>> running >>>>> Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer >>>>> authentication and authorization method (RBAC), a new ingress >>>>> routing >>>>> service, and a different method of integrating with the Developer >>>>> account LDAP service. We have built a new tool [1] which makes >>>>> the >>>>> state of the Kubernetes cluster more transparent and on par with >>>>> the >>>>> information that we already expose for the grid engine cluster >>>>> [2]. >>>>> >>>>> With a significant number of tools managed by Toolforge >>>>> administrators >>>>> already migrated to the new cluster, we are fairly confident >>>>> that the >>>>> basic features used by most Kubernetes tools are covered. It is >>>>> likely >>>>> that a few outlying issues remain to be found as more tools >>>>> move, but >>>>> we have confidence that we can address them quickly. This has >>>>> led us >>>>> to propose a fairly short period of voluntary beta testing, >>>>> followed >>>>> by a short general availability opt-in migration period, and >>>>> finally a >>>>> complete migration of all remaining tools which will be done by >>>>> the >>>>> Toolforge administration team for anyone who has not migrated >>>>> their >>>>> self. >>>>> >>>>> Please help with beta testing if you have some time and are >>>>> willing to >>>>> get help on irc, Phabricator, and the cloud@lists.wikimedia.org >>>>> mailing list for early adopter issues you may encounter. >>>>> >>>>> I want to publicly praise Brooke Storm and Arturo Borrero >>>>> González for >>>>> the hours that they have put into reading docs, building proof of >>>>> concept clusters, and improving automation and processes to make >>>>> the >>>>> 2020 Kubernetes cluster possible. The Toolforge community can >>>>> look >>>>> forward to more frequent and less disruptive software upgrades >>>>> in this >>>>> cluster as a direct result of this work. We have some other >>>>> feature >>>>> improvements in planning now that I think you will all be >>>>> excited to >>>>> see and use later this year! >>>>> >>>>> [0]: >>>>> https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration >>>>> [1]: https://tools.wmflabs.org/k8s-status/ >>>>> [2]: https://tools.wmflabs.org/sge-status/ >>>>> >>>>> Bryan (on behalf of the Toolforge admins and the Cloud Services >>>>> team) >>>>> -- >>>>> Bryan Davis Technical Engagement Wikimedia >>>>> Foundation >>>>> Principal Software Engineer Boise, >>>>> ID USA >>>>> [[m:User:BDavis_(WMF)]] >>>>> irc: bd808 >>>>> >>>>> _______________________________________________ >>>>> Wikimedia Cloud Services announce mailing list >>>>> Cloud-announce@lists.wikimedia.org (formerly >>>>> labs-announce@lists.wikimedia.org) >>>>> https://lists.wikimedia.org/mailman/listinfo/cloud-announce >>>>> >>>> _______________________________________________ >>>> Wikimedia Cloud Services mailing list >>>> Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) >>>> https://lists.wikimedia.org/mailman/listinfo/cloud >>> >>> _______________________________________________ >>> Wikimedia Cloud Services mailing list >>> Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) >>> https://lists.wikimedia.org/mailman/listinfo/cloud >> >> _______________________________________________ >> Wikimedia Cloud Services mailing list >> Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) >> https://lists.wikimedia.org/mailman/listinfo/cloud > > _______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Hi.
While I know what 'kubernetes' is, I don't have any idea if any of the tools I maintain depends on this k8s migration, and if yes, why. I simply use `jsub` to submit jobs to be submitted, and sit back and expect it to work (and it does). I have no memory of ever touching anything kubectl-related commands. Should I worry about something if I ONLY use `jsub`, `qstat`, and `qdel`? Or shall I go to bed satisfied that I won't need to mess with working system?
나의 iPhone에서 보냄
- 06:58, Bryan Davis bd808@wikimedia.org 작성:
I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce _______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
I switched a few "big ones" successfully, but ran into one that doesn't work:
glamtools https://tools.wmflabs.org/glamtools/ is 503 but `webservice status` says "Your webservice of type php7.3 is running".
On Thu, Jan 9, 2020 at 9:58 PM Bryan Davis bd808@wikimedia.org wrote:
I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce _______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
And now I can't switch back:
...switch back commands...
tools.glamtools@tools-sgebastion-07:~$ unalias kubectl tools.glamtools@tools-sgebastion-07:~$ webservice --backend=kubernetes php7.3 start Your job is already running tools.glamtools@tools-sgebastion-07:~$ webservice stop Your webservice is not running
tools.glamtools@tools-sgebastion-07:~$ webservice --backend=kubernetes php7.3 start Your job is already running
On Mon, Jan 13, 2020 at 3:47 PM Magnus Manske magnusmanske@googlemail.com wrote:
I switched a few "big ones" successfully, but ran into one that doesn't work:
glamtools https://tools.wmflabs.org/glamtools/ is 503 but `webservice status` says "Your webservice of type php7.3 is running".
On Thu, Jan 9, 2020 at 9:58 PM Bryan Davis bd808@wikimedia.org wrote:
I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce _______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
I see a problem with at least one container image (which has nothing to do with the new cluster, I can see it on the old cluster as well). It looks like I’m going to be trying to fix that now. (Magnus, this is probably what you are seeing as well).
Brooke Storm Senior SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Jan 13, 2020, at 8:47 AM, Magnus Manske via Cloud cloud@lists.wikimedia.org wrote:
I switched a few "big ones" successfully, but ran into one that doesn't work:
glamtools https://tools.wmflabs.org/glamtools/ https://tools.wmflabs.org/glamtools/ is 503 but `webservice status` says "Your webservice of type php7.3 is running".
On Thu, Jan 9, 2020 at 9:58 PM Bryan Davis <bd808@wikimedia.org mailto:bd808@wikimedia.org> wrote: I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/ https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailto:cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
[0]: https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration [1]: https://tools.wmflabs.org/k8s-status/ https://tools.wmflabs.org/k8s-status/ [2]: https://tools.wmflabs.org/sge-status/ https://tools.wmflabs.org/sge-status/
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org mailto:Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org mailto:labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce https://lists.wikimedia.org/mailman/listinfo/cloud-announce _______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org mailto:Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org mailto:labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud https://lists.wikimedia.org/mailman/listinfo/cloud_______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
I suspect this will affect containers that run Debian Buster packages. I see php7.3 and python3.7. I’d suggest not even restarting web services on those runtimes until we have it fixed. For anyone who has done so, we are working on it. Any logs could be helpful.
Brooke Storm Senior SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Jan 13, 2020, at 9:06 AM, Brooke Storm bstorm@wikimedia.org wrote:
I see a problem with at least one container image (which has nothing to do with the new cluster, I can see it on the old cluster as well). It looks like I’m going to be trying to fix that now. (Magnus, this is probably what you are seeing as well).
Brooke Storm Senior SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Jan 13, 2020, at 8:47 AM, Magnus Manske via Cloud <cloud@lists.wikimedia.org mailto:cloud@lists.wikimedia.org> wrote:
I switched a few "big ones" successfully, but ran into one that doesn't work:
glamtools https://tools.wmflabs.org/glamtools/ https://tools.wmflabs.org/glamtools/ is 503 but `webservice status` says "Your webservice of type php7.3 is running".
On Thu, Jan 9, 2020 at 9:58 PM Bryan Davis <bd808@wikimedia.org mailto:bd808@wikimedia.org> wrote: I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/ https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailto:cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
[0]: https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration [1]: https://tools.wmflabs.org/k8s-status/ https://tools.wmflabs.org/k8s-status/ [2]: https://tools.wmflabs.org/sge-status/ https://tools.wmflabs.org/sge-status/
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org mailto:Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org mailto:labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce https://lists.wikimedia.org/mailman/listinfo/cloud-announce _______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org mailto:Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org mailto:labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud https://lists.wikimedia.org/mailman/listinfo/cloud_______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org mailto:Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org mailto:labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud https://lists.wikimedia.org/mailman/listinfo/cloud
I’ve created Phabricator task T242632 to track this. Please coordinate there as well for anyone with information and time.
Brooke Storm Senior SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Jan 13, 2020, at 9:10 AM, Brooke Storm bstorm@wikimedia.org wrote:
I suspect this will affect containers that run Debian Buster packages. I see php7.3 and python3.7. I’d suggest not even restarting web services on those runtimes until we have it fixed. For anyone who has done so, we are working on it. Any logs could be helpful.
Brooke Storm Senior SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Jan 13, 2020, at 9:06 AM, Brooke Storm <bstorm@wikimedia.org mailto:bstorm@wikimedia.org> wrote:
I see a problem with at least one container image (which has nothing to do with the new cluster, I can see it on the old cluster as well). It looks like I’m going to be trying to fix that now. (Magnus, this is probably what you are seeing as well).
Brooke Storm Senior SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Jan 13, 2020, at 8:47 AM, Magnus Manske via Cloud <cloud@lists.wikimedia.org mailto:cloud@lists.wikimedia.org> wrote:
I switched a few "big ones" successfully, but ran into one that doesn't work:
glamtools https://tools.wmflabs.org/glamtools/ https://tools.wmflabs.org/glamtools/ is 503 but `webservice status` says "Your webservice of type php7.3 is running".
On Thu, Jan 9, 2020 at 9:58 PM Bryan Davis <bd808@wikimedia.org mailto:bd808@wikimedia.org> wrote: I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/ https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailto:cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
[0]: https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration [1]: https://tools.wmflabs.org/k8s-status/ https://tools.wmflabs.org/k8s-status/ [2]: https://tools.wmflabs.org/sge-status/ https://tools.wmflabs.org/sge-status/
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org mailto:Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org mailto:labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce https://lists.wikimedia.org/mailman/listinfo/cloud-announce _______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org mailto:Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org mailto:labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud https://lists.wikimedia.org/mailman/listinfo/cloud_______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org mailto:Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org mailto:labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud https://lists.wikimedia.org/mailman/listinfo/cloud
Per that ticket, I no longer think there is any issue with the images for python at least. There was an issue with some nodes (that is being/mostly fixed). I’ll take a look at glamtools
Brooke Storm Senior SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Jan 13, 2020, at 9:14 AM, Brooke Storm bstorm@wikimedia.org wrote:
I’ve created Phabricator task T242632 to track this. Please coordinate there as well for anyone with information and time.
Brooke Storm Senior SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Jan 13, 2020, at 9:10 AM, Brooke Storm <bstorm@wikimedia.org mailto:bstorm@wikimedia.org> wrote:
I suspect this will affect containers that run Debian Buster packages. I see php7.3 and python3.7. I’d suggest not even restarting web services on those runtimes until we have it fixed. For anyone who has done so, we are working on it. Any logs could be helpful.
Brooke Storm Senior SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Jan 13, 2020, at 9:06 AM, Brooke Storm <bstorm@wikimedia.org mailto:bstorm@wikimedia.org> wrote:
I see a problem with at least one container image (which has nothing to do with the new cluster, I can see it on the old cluster as well). It looks like I’m going to be trying to fix that now. (Magnus, this is probably what you are seeing as well).
Brooke Storm Senior SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Jan 13, 2020, at 8:47 AM, Magnus Manske via Cloud <cloud@lists.wikimedia.org mailto:cloud@lists.wikimedia.org> wrote:
I switched a few "big ones" successfully, but ran into one that doesn't work:
glamtools https://tools.wmflabs.org/glamtools/ https://tools.wmflabs.org/glamtools/ is 503 but `webservice status` says "Your webservice of type php7.3 is running".
On Thu, Jan 9, 2020 at 9:58 PM Bryan Davis <bd808@wikimedia.org mailto:bd808@wikimedia.org> wrote: I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/ https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailto:cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
[0]: https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration [1]: https://tools.wmflabs.org/k8s-status/ https://tools.wmflabs.org/k8s-status/ [2]: https://tools.wmflabs.org/sge-status/ https://tools.wmflabs.org/sge-status/
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org mailto:Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org mailto:labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce https://lists.wikimedia.org/mailman/listinfo/cloud-announce _______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org mailto:Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org mailto:labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud https://lists.wikimedia.org/mailman/listinfo/cloud_______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org mailto:Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org mailto:labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud https://lists.wikimedia.org/mailman/listinfo/cloud