I think I've seen that particular error that
you see in stdout/stderr
(via kubectl logs) before - on pods that in fact were working.
Meanwhile, uwsgi.log says:
Python version: 3.7.3 (default, Apr 3 2019, 05:39:12) [GCC 8.3.0]
Set PythonHome to /data/project/countcounttest/www/python/venv
Fatal Python error: initfsencoding: Unable to get the locale encoding
ModuleNotFoundError: No module named 'encodings'
Current thread 0x00007fe50490e780 (most recent call first):
!!! uWSGI process 1 got Segmentation Fault !!!
followed by a backtrace. Suggests the problem is related to something
inside the image/application code rather than the cluster itself anyway.
I notice the pod on the new cluster seems to be using the sssd variant
of the toolforge-python37-web image, which pods in the old cluster are not
using. I doubt it's the source problem as uwsgi shouldn't be segfaulting
over some problem talking to LDAP...
Needs further investigation by someone during the week I think.
On Sun, 12 Jan 2020 at 23:00, Count Count <countvoncount123456(a)gmail.com>
wrote:
Your pod started and container and it crashed, I
see a uwsgi.log file
> with a python module problem and a uwsgi segfault.
>
Yes. It was working fine with the legacy cluster.
The service ist started via webservice --backend=kubernetes python3.7
start
Apparently it cannot load the uwsgi shared library if deployed on the
new cluster?
tools.countcounttest@tools-sgebastion-07:~$ kubectl logs
countcounttest-6b58f5c547-785mr
open("/usr/lib/uwsgi/plugins/python_plugin.so"): No such file or
directory [core/utils.c line 3724]
!!! UNABLE to load uWSGI plugin:
/usr/lib/uwsgi/plugins/python_plugin.so: cannot open shared object file: No
such file or directory !!!
On Sun, Jan 12, 2020 at 11:42 PM Alex Monk <krenair(a)gmail.com> wrote:
> Hi Count Count, I believe I may have sorted out an issue that
> prevented some pods (depending partially on luck) from creating containers.
> Your pod started and container and it crashed, I see a uwsgi.log file with
> a python module problem and a uwsgi segfault.
>
> On Sun, 12 Jan 2020 at 22:12, Alex Monk <krenair(a)gmail.com> wrote:
>
>> Thanks Count Count. I have identified a new issue with the new k8s
>> cluster and am looking into it.
>>
>> On Sun, 12 Jan 2020 at 21:43, Count Count <
>> countvoncount123456(a)gmail.com> wrote:
>>
>>> Yes, I switched back to the old cluster. This is a new tool that was
>>> used in production even if only rarely. I can't leave it offline for
hours.
>>>
>>> I have created a test tool as a copy with which I can reproduce the
>>> issue:
>>> tools.countcounttest@tools-sgebastion-07:~$ kubectl get pods
>>> NAME READY STATUS
>>> RESTARTS AGE
>>> countcounttest-6b58f5c547-mf4jx 0/1 ContainerCreating 0
>>> 77s
>>>
>>> I will leave that running. If the container gets created I might
>>> also be able to reproduce the segfault.
>>>
>>> Best regards,
>>>
>>> Count Count
>>>
>>> On Sun, Jan 12, 2020 at 10:20 PM Alex Monk <krenair(a)gmail.com>
>>> wrote:
>>>
>>>> Hi Count Count,
>>>>
>>>> I'm afraid you seem to have no pods on the new cluster to look at:
>>>>
>>>> # kubectl get -n tool-flaggedrevspromotioncheck pod
>>>> No resources found.
>>>>
>>>> Alex
>>>>
>>>> On Sun, 12 Jan 2020 at 21:07, Count Count <
>>>> countvoncount123456(a)gmail.com> wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> I don't have much luck with a webservice based on the python3.7
>>>>> image. It is running fine on the legacy K8s cluster.
>>>>>
>>>>> On the new cluster I got a segfault. After stopping the webservice
>>>>> and trying again to get an empty log the pod is now stuck in
>>>>> ContainerCreating.
>>>>>
>>>>> A few minutes ago:
>>>>> tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl
>>>>> get pods
>>>>> NAME READY STATUS
>>>>> RESTARTS AGE
>>>>> flaggedrevspromotioncheck-7cbfff44fc-jnhmq 0/1
>>>>> ContainerCreating 0 2m48s
>>>>>
>>>>> ...and just now:
>>>>> tools.flaggedrevspromotioncheck@tools-sgebastion-08:~$ kubectl
>>>>> get pods
>>>>> NAME READY STATUS
>>>>> RESTARTS AGE
>>>>> flaggedrevspromotioncheck-7cbfff44fc-q55gm 0/1
>>>>> ContainerCreating 0 5m18s
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Count Count
>>>>>
>>>>> On Thu, Jan 9, 2020 at 10:58 PM Bryan Davis
<bd808(a)wikimedia.org>
>>>>> wrote:
>>>>>
>>>>>> I am happy to announce that a new and improved Kubernetes
cluster
>>>>>> is
>>>>>> now available for use by beta testers on an opt-in basis. A page
>>>>>> has
>>>>>> been created on Wikitech [0] outlining the self-service
migration
>>>>>> process.
>>>>>>
>>>>>> Timeline:
>>>>>> * 2020-01-09: 2020 Kubernetes cluster available for beta testers
>>>>>> on an
>>>>>> opt-in basis
>>>>>> * 2020-01-23: 2020 Kubernetes cluster general availability for
>>>>>> migration on an opt-in basis
>>>>>> * 2020-02-10: Automatic migration of remaining workloads from
2016
>>>>>> cluster to 2020 cluster by Toolforge admins
>>>>>>
>>>>>> This new cluster has been a work in progress for more than a
year
>>>>>> within the Wikimedia Cloud Services team, and a top priority
>>>>>> project
>>>>>> for the past six months. About 35 tools, including
>>>>>>
https://tools.wmflabs.org/admin/, are currently running on what
>>>>>> we are
>>>>>> calling the "2020 Kubernetes cluster". This new cluster
is running
>>>>>> Kubernetes v1.15.6 and Docker 19.03.4. It is also using a newer
>>>>>> authentication and authorization method (RBAC), a new ingress
>>>>>> routing
>>>>>> service, and a different method of integrating with the
Developer
>>>>>> account LDAP service. We have built a new tool [1] which makes
the
>>>>>> state of the Kubernetes cluster more transparent and on par with
>>>>>> the
>>>>>> information that we already expose for the grid engine cluster
>>>>>> [2].
>>>>>>
>>>>>> With a significant number of tools managed by Toolforge
>>>>>> administrators
>>>>>> already migrated to the new cluster, we are fairly confident
that
>>>>>> the
>>>>>> basic features used by most Kubernetes tools are covered. It is
>>>>>> likely
>>>>>> that a few outlying issues remain to be found as more tools
move,
>>>>>> but
>>>>>> we have confidence that we can address them quickly. This has
led
>>>>>> us
>>>>>> to propose a fairly short period of voluntary beta testing,
>>>>>> followed
>>>>>> by a short general availability opt-in migration period, and
>>>>>> finally a
>>>>>> complete migration of all remaining tools which will be done by
>>>>>> the
>>>>>> Toolforge administration team for anyone who has not migrated
>>>>>> their
>>>>>> self.
>>>>>>
>>>>>> Please help with beta testing if you have some time and are
>>>>>> willing to
>>>>>> get help on irc, Phabricator, and the cloud(a)lists.wikimedia.org
>>>>>> mailing list for early adopter issues you may encounter.
>>>>>>
>>>>>> I want to publicly praise Brooke Storm and Arturo Borrero
>>>>>> González for
>>>>>> the hours that they have put into reading docs, building proof
of
>>>>>> concept clusters, and improving automation and processes to make
>>>>>> the
>>>>>> 2020 Kubernetes cluster possible. The Toolforge community can
look
>>>>>> forward to more frequent and less disruptive software upgrades
in
>>>>>> this
>>>>>> cluster as a direct result of this work. We have some other
>>>>>> feature
>>>>>> improvements in planning now that I think you will all be
excited
>>>>>> to
>>>>>> see and use later this year!
>>>>>>
>>>>>> [0]:
>>>>>>
https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration
>>>>>> [1]:
https://tools.wmflabs.org/k8s-status/
>>>>>> [2]:
https://tools.wmflabs.org/sge-status/
>>>>>>
>>>>>> Bryan (on behalf of the Toolforge admins and the Cloud Services
>>>>>> team)
>>>>>> --
>>>>>> Bryan Davis Technical Engagement Wikimedia
>>>>>> Foundation
>>>>>> Principal Software Engineer Boise,
>>>>>> ID USA
>>>>>> [[m:User:BDavis_(WMF)]]
irc:
>>>>>> bd808
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wikimedia Cloud Services announce mailing list
>>>>>> Cloud-announce(a)lists.wikimedia.org (formerly
>>>>>> labs-announce(a)lists.wikimedia.org)
>>>>>>
https://lists.wikimedia.org/mailman/listinfo/cloud-announce
>>>>>>
>>>>> _______________________________________________
>>>>> Wikimedia Cloud Services mailing list
>>>>> Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
>>>>>
https://lists.wikimedia.org/mailman/listinfo/cloud
>>>>
>>>> _______________________________________________
>>>> Wikimedia Cloud Services mailing list
>>>> Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
>>>>
https://lists.wikimedia.org/mailman/listinfo/cloud
>>>
>>> _______________________________________________
>>> Wikimedia Cloud Services mailing list
>>> Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
>>>
https://lists.wikimedia.org/mailman/listinfo/cloud
>>
>> _______________________________________________
> Wikimedia Cloud Services mailing list
> Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
>
https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)