Hi Count Count, I believe I may have sorted out an issue that prevented some pods (depending partially on luck) from creating containers. Your pod started and container and it crashed, I see a uwsgi.log file with a python module problem and a uwsgi segfault.
On Sun, 12 Jan 2020 at 22:12, Alex Monk krenair@gmail.com wrote:
Thanks Count Count. I have identified a new issue with the new k8s cluster and am looking into it.
On Sun, 12 Jan 2020 at 21:43, Count Count countvoncount123456@gmail.com wrote:
Yes, I switched back to the old cluster. This is a new tool that was used in production even if only rarely. I can't leave it offline for hours.
I have created a test tool as a copy with which I can reproduce the issue: tools.countcounttest@tools-sgebastion-07:~$ kubectl get pods NAME READY STATUS RESTARTS AGE countcounttest-6b58f5c547-mf4jx 0/1 ContainerCreating 0 77s
I will leave that running. If the container gets created I might also be able to reproduce the segfault.
Best regards,
Count Count
On Sun, Jan 12, 2020 at 10:20 PM Alex Monk krenair@gmail.com wrote:
Hi Count Count,
I'm afraid you seem to have no pods on the new cluster to look at:
# kubectl get -n tool-flaggedrevspromotioncheck pod No resources found.
Alex
On Sun, 12 Jan 2020 at 21:07, Count Count countvoncount123456@gmail.com wrote:
Hi!
I don't have much luck with a webservice based on the python3.7 image. It is running fine on the legacy K8s cluster.
On the new cluster I got a segfault. After stopping the webservice and trying again to get an empty log the pod is now stuck in ContainerCreating.
A few minutes ago: tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get pods NAME READY STATUS RESTARTS AGE flaggedrevspromotioncheck-7cbfff44fc-jnhmq 0/1 ContainerCreating 0 2m48s
...and just now: tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl get pods NAME READY STATUS RESTARTS AGE flaggedrevspromotioncheck-7cbfff44fc-q55gm 0/1 ContainerCreating 0 5m18s
Best regards,
Count Count
On Thu, Jan 9, 2020 at 10:58 PM Bryan Davis bd808@wikimedia.org wrote:
I am happy to announce that a new and improved Kubernetes cluster is now available for use by beta testers on an opt-in basis. A page has been created on Wikitech [0] outlining the self-service migration process.
Timeline:
- 2020-01-09: 2020 Kubernetes cluster available for beta testers on an
opt-in basis
- 2020-01-23: 2020 Kubernetes cluster general availability for
migration on an opt-in basis
- 2020-02-10: Automatic migration of remaining workloads from 2016
cluster to 2020 cluster by Toolforge admins
This new cluster has been a work in progress for more than a year within the Wikimedia Cloud Services team, and a top priority project for the past six months. About 35 tools, including https://tools.wmflabs.org/admin/, are currently running on what we are calling the "2020 Kubernetes cluster". This new cluster is running Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer authentication and authorization method (RBAC), a new ingress routing service, and a different method of integrating with the Developer account LDAP service. We have built a new tool [1] which makes the state of the Kubernetes cluster more transparent and on par with the information that we already expose for the grid engine cluster [2].
With a significant number of tools managed by Toolforge administrators already migrated to the new cluster, we are fairly confident that the basic features used by most Kubernetes tools are covered. It is likely that a few outlying issues remain to be found as more tools move, but we have confidence that we can address them quickly. This has led us to propose a fairly short period of voluntary beta testing, followed by a short general availability opt-in migration period, and finally a complete migration of all remaining tools which will be done by the Toolforge administration team for anyone who has not migrated their self.
Please help with beta testing if you have some time and are willing to get help on irc, Phabricator, and the cloud@lists.wikimedia.org mailing list for early adopter issues you may encounter.
I want to publicly praise Brooke Storm and Arturo Borrero González for the hours that they have put into reading docs, building proof of concept clusters, and improving automation and processes to make the 2020 Kubernetes cluster possible. The Toolforge community can look forward to more frequent and less disruptive software upgrades in this cluster as a direct result of this work. We have some other feature improvements in planning now that I think you will all be excited to see and use later this year!
Bryan (on behalf of the Toolforge admins and the Cloud Services team)
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud