[Labs-admin] Ways to go forward with Kubernetes

Bryan Davis bd808 at wikimedia.org
Sat Dec 3 00:34:49 UTC 2016


On Fri, Dec 2, 2016 at 11:09 AM, Yuvi Panda <yuvipanda at gmail.com> wrote:
>
> == 1. Finish the webservice migration to k8s completely ==
>
> We would start by defaulting new tools to k8s, then slowly flip things
> over one webservice type at a time.
>
> Pros:
>
> 1. Gridengine use lessens
> 2. We can get rid of all our custom webproxy related code, which
> should simplify everything - it would mean we're now running a
> 'normal' gridengine setup
>
> Cons:
> 1. Since we don't have an omnibus container, we need to figure out A
> Solution(tm) for some users (PHP script shelling out to Python to call
> a Java JAR etc), if/when we encounter them.

What would the main challenges be to allowing new images to be derived
from our base images by defining a method for a tool account to
specify a Dockerfile? We wouldn't need to go all the way to allowing
arbitrary Dockerfiles from anywhere on the net, but could instead
enforce that they had to derive from one of our base images. The tool
maintainers could then pick and choose packages to install via apt
from the repos that we already control. If we wanted to restrict it
only to apt installs it could even be a "requirements.txt" style file
that was checked for an applied rather than a full Dockerfile.

> 2. There are at least a couple of tools that call jsub from their
> webservices, and that won't work here if we don't have a kubernetes
> backend for jsub

Won't a jsub backend have the exact same mega-container requirement
problem? At the very least the webservice embedded jsub calls would
need to be changed to specify the container variant to spawn. It seems
likely that we'd also want them to change to specify the k8s backend
explicitly.

> 3. This won't be truly 'no changes seen' from a UX POV - versions of
> software will likely change, since we don't have the omnibus
> container. This will cause change fatigue when we move to a PaaS.

This was my big worry when I asked Yuvi to kick off this discussion.
There were a few comments in the last survey that basically said
"every time I learn something it becomes deprecated" or "I only have
to touch my tools because you keep changing the platform underneath
me". If we are really going to try and move away from homegrown PaaS
hacks and adopt OpenShift/Deis/Whatever (and I think we should), I'm
not sure I see the need to first move everyone from OGE to our k8s
prototype and then immediately after tell them they need to move to
the PaaS.

> == 2. Add a kubernetes backend to jsub ==
>
> Add a --backend=kubernetes to jsub/jstart. This would let people use
> k8s to submit jobs / crons / tasks.
>
> Pros:
>
> 1. Gridengine use lessens
> 2. Easy way to move off gridengine for people 'stuck up'
> 3. Allows continuing function of webservice tools that call jsub
>
> Cons:
>
> 1. Stuck up people might continue to be stuck up
> 2. Same as cons (1) and (3) from webservice mgiration option
>
> == 3. Write up docs on how to use k8s directly for job submission ==
>
> kubectl is super nice and fairly friendly, and has good docs upstream.
> We can just write up really nice documentation on how people can just
> use that directly, and help some people switch.
>
> Pros:
>
> 1. Not much code work
> 2. Purely upstream supported solution
>
> Cons:
>
> 1. Requires a fair amount of change
> 2. The omnibus container problem exists / persists
> 3. change averse users will be change averse

My biggest concern here is that if we choose an opinionated PaaS such
as OpenShift this direct interaction with kubectl may be blocked in
the future and we are back in the two migrations instead of one
scenario. It was almost trivially easy to move Stashbot to direct
kubectl thanks to all of the work that Yuvi did for webservice, but we
still have some kinks to work out with logging and restarts for things
that can't run parallel processes.

> == 4. Go full on on evaluating / setting up PaaS ==
>
> Tools is a PaaS, and we should be running a PaaS. There are several
> that are running on k8s, and they are all a bit more mature now than
> when we started (that was the reason we didn't just go with a PaaS in
> the beginning). We'll just set one up, and move people over slowly
>
> Pros:
> 1. Have a real upstream!
> 2. Brings tools to the early 2010s!
>
> Cons:
> 1. Lots of change!
> 2. Big project, and Yuvi will definitely not be around to do much in
> it. Might be a Pro.
> 3. Change averse people will be change averse.

Con #1 is both a pro and con in my opinion, but yes we will be moving
all of the cheese and will also have to invest in tooling,
documentation, and hands-on support to make the change as easy as we
can for the largest percentage of tools we can.

There is literally nothing we can do for con #3 short of freezing
everything. At some point we will have to move forward and there will
be some users and tools who chose not to invest in the changes needed.
I don't want to aggravate that problem with unnecessary intermediate
steps, but we can't hold back everything for the sake of a few.

> Whatever we end up doing will be a combination of these 4 I guess, and
> it's a question of prioritization. Thoughts?

The timelines and roadmaps for killing OGE come from before any of my
direct involvement. Is EOL of Ubuntu 14.04 the main driver or is it
adoption of Jessie due to Puppet changes that we will be holding
production back on or is it something else entirely? Assuming we could
select a PaaS by the end of Q3 (March) and be ready to start doing
major evangelization at the May hackathon how much time would we have
left before we had to shutdown OGE?

Bryan
-- 
Bryan Davis              Wikimedia Foundation    <bd808 at wikimedia.org>
[[m:User:BDavis_(WMF)]]  Sr Software Engineer            Boise, ID USA
irc: bd808                                        v:415.839.6885 x6855



More information about the Labs-admin mailing list