*Keynotes*
* developers at some point should never have to know or care what the backend is. https://github.com/brendandburns/metaparticle :* This is "real infrastructure as code"
* subjective: workflow and developer experience is the new frontier. All of this ecosystem is being built not as an end in itself, but rather to describe a platform that enables innovation.
"Platforms are about speed" "Kubectl is the new SSH." "You know you are a Sr Engineer when people like you." "We call them soft skills but they are hard to pull off." *---Kelsey Hightower*
*Container runtime and image format standards*
https://www.opencontainers.org/announcement/2017/07/19/open-container-initia...
i.e. OCI is 1.0
This took 2 years and is a further work in progress. The OCI spec has what you should do, what you can do, and also interestingly what you may not do. One of the presenters spoke about this experience vs POSIX which took a decade. Not much here that can't be gleamed from reading the spec and/or news on the inititiative.
*Running mixed worklods on Kubernetes at IMHE* https://kccncna17.sched.com/event/CU7z/running-mixed-workloads-on-kubernetes...
tldr; Univa owns the copyrights to SGE. They have a closed fork having improved on the last SGE release (that we run). They facilitate close-SGE on k8s as a managed service. Most of this was marketing and explanations of IMHE.
takeaways; I thought at first that Univa had released their changes back into the world, and it seems their original stated intent back in the day was an opencore model. But alas, no, they are not playing nicely with the greater FLOSS world. Yuvi even asked about this and it was followed by some noncommittal answers about code already on github. Ironically, what is on github is their fork of the original https://github.com/gridengine/gridengine: 'Commits on Jun 1, 2012'. We have seen 'gridengine' packages pop up on Debian Strech and that can give the illusion of project health. The debs in Stretch are "Son of Grid Engine" based on the 6.2 last opensource release as seen at https://arc.liv.ac.uk/trac/SGE/ and this seems like a barely surviving varient:
* https://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=gridengine-master;dist=uns...
* http://metadata.ftp-master.debian.org/changelogs/main/g/gridengine/gridengin...
* 2016-03-02: Version 8.1.9 available. Note that this changes the communication protocol due to the MUNGE support, and really should have been labelled 8.2 in hindsight — ensure you close down execds before upgrading.
Much love to these folks at University of Liverpool but we should double down on the narrative that SGE or SoGE are dead projects. Even if we stay on them into Debian Stretch out of convenience. This model of running a batch style system on k8s is maybe interesting but it's got to be something other than S[0]GE.
*Reference* http://www.sdsc.edu/~hocks/FG/MSKCC.slurm.sge.html https://hpc.nih.gov/docs/pbs2slurm.html https://en.wikipedia.org/wiki/Comparison_of_cluster_software http://www.sdsc.edu/~hocks/FG/LL-PBS-SGE.html https://bugs.schedmd.com/show_bug.cgi?id=2208
*More usable k8s*
https://kccncna17.sched.com/event/CU8L/the-road-to-more-usable-kubernetes-jo...
Joe Beda is a super interesting thinker in this space IMO and I went to this mainly because of him.
tldr; Heptio has ksonnet (https://github.com/ksonnet) which is a way of thinking about composing infrastructure as code. Kinda-sorta Helm alternative but I think both sides would bristle at that description :) ksonnet seems deeply interesting but a lot of the configuration avoidance I see as convention as configuration. That is lock-in as much as, or more so, than a helm based approach. Totally valid criticisms of Helm "at scale" I imagine, I have used Helm only a bit for personal testing so I'm not entirely sure. This space is young and the urge for these folks to build a DSL is strong.
Described Yaml as being an "assembly level primitive" (in regards to Helm). Would like to try out ksonnet a bit more. Not convinced.
*Multi-Tenancy Support and Security Modeling with RBAC and Namespaces*
https://kccncna17.sched.com/event/CU7j/multi-tenancy-support-security-modeli...
tldr; walk through RBAC personas. The models we want to replace our homebrew in-theory mostly exist. Show off the VMWare UI on top of the k8s native magic. I was hoping for more technical breakdown. Interesting descriptions of the ClusterRole vs NamespaceRole breakdowns and Namespace reasoned isolation. Fits nicely into our model of the world.
*CNI, CRI, and OCI "Oh My"*
https://kccncna17.sched.com/event/CU6L/cni-cri-and-oci-oh-my-i-elsie-phillip... goo.gl/fK8kFS
tldr; standars and their origination. The slides are decent. Two community liason type folks from CoreOS talking about AppC being abandoned. Some foundational thinking "What is a container?" "Why do standards exist?" "How is Docker involved?"
I have found CNI confusing as far as scope, standard and spec or implementation? So it was mainly an unwinding of trivia acronyms for me.
*Local Ephemeral Resource Management* https://kccncna17.sched.com/event/CU7X/local-ephemeral-storage-resource-mana...
I really liked her style of presentation and clear breakdown of ideas. I think this was a more academic presentation from someone who clearly is in the trenches but I went to get insight into one essential problem: Disk IO QoS and limiting. That was on the last slide labeled "Future" and she said they were determining if it was a "problem worth solving". If we had unlimited money I would hire this person.
Mainly talking about quotas and quota setting levels for storage: pod, namespace. Most of this is k8s 1.8 or greater AFAICT.
rant: stateful reasoning and resourcing of tenants with sane isolation for storage is this huge elephant in the room in this cloud native world. I noted in the TOC public meeting that the storage sig had become particularly vocal after a period of relative politeness over async channels. I continue to think that resource isolation of storage is the single least solved problem in cloud. Basically, you should tie logical resources for compute and mem to physical resources that are islands to dedicate resourcing.
*Prometheus 2.0 "salon"*
I think 2.0 is the first release of Prometheus I have seen that looks prod ready. The list of half-punted issues was always too long for me: backups, alerts, rollups, performance, storage. 2.0 is not backwards compatible at all w/ 1.x. The ex-intern-engineer giving the "what's new in 2.0" portion of the talk said to just move to 2.0 and leave old metrics behind. I think for our stuff we should actually do this. That slide deck is not published but most of it is here https://coreos.com/blog/prometheus-2.0-released. The performance improvements are awesome. The storage usage is awesome. Rather than feeling like Prometheus is the best of bad options I think it may actually be...cool as of 2.0. Nice talk about a lot of the nuts and bolts reasoning of Prometheus internals https://schd.ws/hosted_files/kccncna17/c4/KubeCon%20P8s%20Salon%20-%20Kubern.... There were 3 presentations over about an hours and a half. A lot of wisdom on practical applications for tagging and collection. How not to explode cardinality with well-intentioned-but-chaotic-tagging, and that kind of thing. 2.0 has //no downtime backups//. Rule groups are now defined w/ yaml. Worth looking through that presentation and the 2.0 announcement. We have a lot of things to figure out here but it seems the propulsion (of k8s) and investment in Prometheus may have led to something usable...potentially :) I see only <2.0 in Debian atm.
https://prometheus.io/blog/2017/11/08/announcing-prometheus-2-0/ https://kccncna17.sched.com/event/Cs4d
'migration' https://prometheus.io/docs/prometheus/latest/migration/
*Openstack and k8s SIG*
Background: k8s has the ability to integrate more tightly with an external component. i.e. maybe service IPs are actually managed in Neutron at the openstack layer providing visibility and integration, or Cinder blocks are allocated in OpenStack to be used by k8s. etc. https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/provi...
*My impressions and take aways even though it was hard to keep track and I may be wrong:*
This despite having been around for awhile is in early stages and IMO the future is unknown. Huawei apparently has been doing some work here since they run a sizeable openstack cloud and are heavily invested in k8s. Who should own CI and integration testing? Where do the resources come from? Integration testing from Mitaka+ and need to certify that HEAD in k8s land does not break existing use cases, and possibly certify certain OS releases for certain k8s releases. It seems k8s upstream wants to decouple all provider code into external libraries to take it out of core and make the projects more independent. Who owns this?
"As we all know Neutron is not very self describing" -- Random Dev In This SIG
Lots of talk and hijacking on install best practices. It seems there is some consensus in k8s internal circles that kubeadm will be the future across all mediums for k8s deployment. Kubespray was mentioned several times. So that's k8s on openstack. What about openstack on k8s? :D Some are doing it, no one has published significant blogs or use cases. Most openstack devs seem to be using https://github.com/openstack/openstack-ansible which seems like LXC w/o a k8s like scheduler or orchestration layer. Kolla ansible seems to have momentum and be blessed but no one there had much to say about it otherwise.
I'm really interested in this area of inquiry but at the very present moment I think operating our entities as ships-in-the-night has a lot of benefit as the tangle of integration runs deep and muddy.
cloud-admin@lists.wikimedia.org