Hello,
First of all, thank you to Chico Venancio for the explanation about "kubectl describe". However it still did not start...
It seems to me that there is a problem with python3.7. When I install the venv for python3.7 (web and base, same problem) and I do a command "hello Kubernetes!", it crashes (deployment file here https://paste.toolforge.org/view/89fc0b13). If I replace 3.7 through 3.5, then It works and returns the hello Kubernetes.
After bringing back everything to Python 3.5, the Celery worker is running (at last!).
Psemdel
On Wed, Oct 7, 2020 at 9:15 AM maxime delzenne maxime.delzenne@gmail.com wrote:
Hello,
First of all, thank you to Chico Venancio for the explanation about "kubectl describe". However it still did not start...
It seems to me that there is a problem with python3.7. When I install the venv for python3.7 (web and base, same problem) and I do a command "hello Kubernetes!", it crashes (deployment file here https://paste.toolforge.org/view/89fc0b13). If I replace 3.7 through 3.5, then It works and returns the hello Kubernetes.
After bringing back everything to Python 3.5, the Celery worker is running (at last!).
This sounds suspiciously like you made the virtual environment from a login shell on one of the Toolforge bastions (where python3 is actually python3.5). To make a Python 3.7 virtual environment it is necessary to first enter a python3.7 container running on the Kubernetes cluster. This can be done using the `webservice python3.7 shell` command. See https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Python#Virtual_Environments_and_Packages for related information.
Bryan
It would be really good if the bastions could be updated to something more modern, so we could have a uniform python infrastructure everywhere. I think I once opened a phab ticket for this, which was closed as some variation on infeasible.
I ran into exactly the issue described below myself. After trying a bunch of various workarounds, I ended up building my own python 3.7 from source. That's suboptimal in so many ways, but it was the only way I could find to get a consistent setup between my test/dev environment on the bastion and my production environment on kubernetes.
At some point, this is going to become a more acute problem, since 3.5 is officially end-of-life.
On Oct 7, 2020, at 11:41 AM, Bryan Davis bd808@wikimedia.org wrote:
On Wed, Oct 7, 2020 at 9:15 AM maxime delzenne maxime.delzenne@gmail.com wrote:
Hello,
First of all, thank you to Chico Venancio for the explanation about "kubectl describe". However it still did not start...
It seems to me that there is a problem with python3.7. When I install the venv for python3.7 (web and base, same problem) and I do a command "hello Kubernetes!", it crashes (deployment file here https://paste.toolforge.org/view/89fc0b13). If I replace 3.7 through 3.5, then It works and returns the hello Kubernetes.
After bringing back everything to Python 3.5, the Celery worker is running (at last!).
This sounds suspiciously like you made the virtual environment from a login shell on one of the Toolforge bastions (where python3 is actually python3.5). To make a Python 3.7 virtual environment it is necessary to first enter a python3.7 container running on the Kubernetes cluster. This can be done using the `webservice python3.7 shell` command. See https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Python#Virtual_Environments_and_Packages for related information.
Bryan
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
On Wed, Oct 7, 2020 at 11:54 AM Roy Smith roy@panix.com wrote:
It would be really good if the bastions could be updated to something more modern, so we could have a uniform python infrastructure everywhere. I think I once opened a phab ticket for this, which was closed as some variation on infeasible.
I ran into exactly the issue described below myself. After trying a bunch of various workarounds, I ended up building my own python 3.7 from source. That's suboptimal in so many ways, but it was the only way I could find to get a consistent setup between my test/dev environment on the bastion and my production environment on kubernetes.
Use a `webservice python3.7 shell` session as your dev/test environment and you will a) get the same python version as the "production" container, and b) move your dev/test workload off of the limited resources of the bastion server and onto the more scalable Kubernetes cluster.
At some point, this is going to become a more acute problem, since 3.5 is officially end-of-life.
The bastions in Toolforge need to be compatible with the Grid Engine cluster because they act as job submission hosts for the grid. Today, the grid engine cluster is running Debian Stretch as its base operating system [0]. Debian Stretch is a supported release through June 2022. There is currently no scheduled work to rebuild and replace the Debian Stretch instances in Toolforge, but rest assured that this work will happen before the end of life of Debian Stretch. I expect that the Toolforge admin team will start discussing the work needed to rebuild the Toolforge bastions and grid engine instances sometime after the Debian project releases their next stable version, Bullseye [1].
[0]: https://wikitech.wikimedia.org/wiki/News/Toolforge_Trusty_deprecation [1]: https://www.debian.org/releases/bullseye/
Bryan
On Oct 7, 2020, at 4:04 PM, Bryan Davis bd808@wikimedia.org wrote:
On Wed, Oct 7, 2020 at 11:54 AM Roy Smith roy@panix.com wrote:
It would be really good if the bastions could be updated to something more modern, so we could have a uniform python infrastructure everywhere. I think I once opened a phab ticket for this, which was closed as some variation on infeasible.
I ran into exactly the issue described below myself. After trying a bunch of various workarounds, I ended up building my own python 3.7 from source. That's suboptimal in so many ways, but it was the only way I could find to get a consistent setup between my test/dev environment on the bastion and my production environment on kubernetes.
Use a `webservice python3.7 shell` session as your dev/test environment and you will a) get the same python version as the "production" container, and b) move your dev/test workload off of the limited resources of the bastion server and onto the more scalable Kubernetes cluster.
BTDT. It was unusable. People can search the list archives if they're interested in details. At this point, I can't even get in:
WARNING: No explict backend provided. Using default of 'kubernetes' For help refer to https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web runtime: failed to create new OS thread (have 3 already; errno=11) runtime: may need to increase max user processes (ulimit -u) fatal error: newosproc
I understand that k8s is scalable, and the way of the future, and that upgrades need to be prioritized, etc. But the current setup is not an effective development environment.
At some point, this is going to become a more acute problem, since 3.5 is officially end-of-life.
The bastions in Toolforge need to be compatible with the Grid Engine cluster because they act as job submission hosts for the grid. Today, the grid engine cluster is running Debian Stretch as its base operating system [0]. Debian Stretch is a supported release through June 2022. There is currently no scheduled work to rebuild and replace the Debian Stretch instances in Toolforge, but rest assured that this work will happen before the end of life of Debian Stretch. I expect that the Toolforge admin team will start discussing the work needed to rebuild the Toolforge bastions and grid engine instances sometime after the Debian project releases their next stable version, Bullseye [1].
Bryan
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
On Wed, Oct 7, 2020 at 4:16 PM Roy Smith roy@panix.com wrote:
On Oct 7, 2020, at 4:04 PM, Bryan Davis bd808@wikimedia.org wrote:
Use a `webservice python3.7 shell` session as your dev/test environment and you will a) get the same python version as the "production" container, and b) move your dev/test workload off of the limited resources of the bastion server and onto the more scalable Kubernetes cluster.
BTDT. It was unusable. People can search the list archives if they're interested in details. At this point, I can't even get in:
WARNING: No explict backend provided. Using default of 'kubernetes' For help refer to https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web runtime: failed to create new OS thread (have 3 already; errno=11) runtime: may need to increase max user processes (ulimit -u) fatal error: newosproc
This error indicates that you are running out of process quota on the bastion server due to the number of processes that you are attempting to run in parallel. We have fairly strict limits placed on each user to help prevent overwhelming the bastions. The quota is typically enough to run 2-3 small applications.
I can see that you are currently logged into the dev.toolforge.org bastion (tools-sgebastion-08.tools.eqiad.wmflabs) where you have a tmux session running with four bash shells and a python process that appears to be some sort of TCP log event sync. This last process is almost certainly pushing your thread quota to the limit.
I understand that k8s is scalable, and the way of the future, and that upgrades need to be prioritized, etc. But the current setup is not an effective development environment.
This may actually be the root of some of your frustrations. Toolforge is not meant to be a replacement for a development environment on your local laptop or other server. It is intended primarily to be a runtime environment for tools. Attempting to use the bastions as your primary development environment is going to be a struggle against the various quotas and limits that have been placed on the bastions in an attempt to keep them available and responsive for people who need to start/stop/restart their tools and check log output.
You might be interested in things like https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Connecting_to_the_database_replicas_from_your_own_computer to help you find a way to develop and test your software outside of Toolforge.
Bryan
On Oct 7, 2020, at 7:28 PM, Bryan Davis bd808@wikimedia.org wrote:
This may actually be the root of some of your frustrations. Toolforge is not meant to be a replacement for a development environment on your local laptop or other server.
Hmmm. That's not the impression I got from reading what's available on wikitech. It's called a "developer account". Pages like https://wikitech.wikimedia.org/wiki/Portal:Toolforge/About_Toolforge https://wikitech.wikimedia.org/wiki/Portal:Toolforge/About_Toolforge say things like, "If you already know these basics, then you are ready to start developing tools!" So, it sure sounds like it's supposed to be a development environment.
And https://wikitech.wikimedia.org/wiki/Help:Toolforge/Developing_successful_too... https://wikitech.wikimedia.org/wiki/Help:Toolforge/Developing_successful_tools says:
Pick the right development environment If you will be doing heavy processing (e.g., compiles or tool test runs), please use the development environment (dev.toolforge.org) instead of the primary login host (login.toolforge.org) so as to help maintain the interactive performance of the primary login host.
which is at odds with your statement that it's not meant to be a development environment. Yes, I keep several tmux sessions nailed up. It makes life easier and I assumed it would be a very small impact on the system. As for:
a python process that appears to be some sort of TCP log event sync.
That's exactly what it is. Tailing a NFS log file is not practical. I opened a phab ticket on this https://phabricator.wikimedia.org/T256426 which got triaged to a low priority. Fair enough, so I found a workaround. The process you're seeing is essentially a private syslog server I'm running. My k8s web server process logs to that, bypassing NFS. I don't need that running when I'm not actively debugging, so I'll shut it down now. It didn't seem like the kind of thing that would be imposing any significant load.
Bryan,
Let me take a step backwards. I accept that documentation may be out of date, wrong, misleading, etc. Maybe I've been barking up the wrong tree because the out-of-date documentation led me astray. These things happen as systems evolve.
So, what I want to do is develop and run web servers, written in Python/django, which can access the database mirrors (the same databases that Quarry runs against) and also make API calls. I need my users to be able to authenticate to my sever using their en.wikipedia.org http://en.wikipedia.org/ credentials (i.e. via OAuth).
I'd really like that my development environment be as similar to the production environment as possible. The more differences there are, the more complicated things become. In other words, keeping the development environment on my laptop would be sub-optimal.
If you were setting out to do that, how would you set up your development and production environments, given the current state of Toolforge?
On Oct 7, 2020, at 8:16 PM, Roy Smith roy@panix.com wrote:
On Oct 7, 2020, at 7:28 PM, Bryan Davis <bd808@wikimedia.org mailto:bd808@wikimedia.org> wrote:
This may actually be the root of some of your frustrations. Toolforge is not meant to be a replacement for a development environment on your local laptop or other server.
Hmmm. That's not the impression I got from reading what's available on wikitech. It's called a "developer account". Pages like https://wikitech.wikimedia.org/wiki/Portal:Toolforge/About_Toolforge https://wikitech.wikimedia.org/wiki/Portal:Toolforge/About_Toolforge say things like, "If you already know these basics, then you are ready to start developing tools!" So, it sure sounds like it's supposed to be a development environment.
And https://wikitech.wikimedia.org/wiki/Help:Toolforge/Developing_successful_too... https://wikitech.wikimedia.org/wiki/Help:Toolforge/Developing_successful_tools says:
Pick the right development environment If you will be doing heavy processing (e.g., compiles or tool test runs), please use the development environment (dev.toolforge.org http://dev.toolforge.org/) instead of the primary login host (login.toolforge.org http://login.toolforge.org/) so as to help maintain the interactive performance of the primary login host.
which is at odds with your statement that it's not meant to be a development environment. Yes, I keep several tmux sessions nailed up. It makes life easier and I assumed it would be a very small impact on the system. As for:
a python process that appears to be some sort of TCP log event sync.
That's exactly what it is. Tailing a NFS log file is not practical. I opened a phab ticket on this https://phabricator.wikimedia.org/T256426 which got triaged to a low priority. Fair enough, so I found a workaround. The process you're seeing is essentially a private syslog server I'm running. My k8s web server process logs to that, bypassing NFS. I don't need that running when I'm not actively debugging, so I'll shut it down now. It didn't seem like the kind of thing that would be imposing any significant load.
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud