Open Stack

Fri May 27 22:30:38 UTC 2016

I spent a bit of time exploring the idea of using Heat as an external 
orchestration layer on top of Kubernetes - specifically in the case of 
TripleO controller nodes but I think it could be more generally useful 
too - but eventually came to the conclusion it doesn't work yet, and 
probably won't for a while. Nevertheless, I think it's helpful to 
document a bit to help other people avoid going down the same path, and 
also to help us focus on working toward the point where it _is_ 
possible, since I think there are other contexts where it would be 
useful too.

We tend to refer to Kubernetes as a "Container Orchestration Engine" but 
it does not actually do any orchestration, unless you count just 
starting everything at roughly the same time as 'orchestration'. Which I 
wouldn't. You generally handle any orchestration requirements between 
services within the containers themselves, possibly using external 
services like etcd to co-ordinate. (The Kubernetes project refer to this 
as "choreography", and explicitly disclaim any attempt at orchestration.)

What Kubernetes *does* do is more like an actively-managed version of 
Heat's SoftwareDeploymentGroup (emphasis on the _Group_). Brief recap: 
SoftwareDeploymentGroup is a type of ResourceGroup; you give it a map of 
resource names to server UUIDs and it creates a SoftwareDeployment for 
each server. You have to generate the list of servers somehow to give it 
(the easiest way is to obtain it from the output of another 
ResourceGroup containing the servers). If e.g. a server goes down you 
have to detect that externally, and trigger a Heat update that removes 
it from the templates, redeploys a replacement server, and regenerates 
the server list before a replacement SoftwareDeployment is created. In 
constrast, Kubernetes is running on a cluster of servers, can use rules 
to determine where to run containers, and can very quickly redeploy 
without external intervention in response to a server or container 
falling over. (It also does rolling updates, which Heat can also do 
albeit in a somewhat hacky way when it comes to SoftwareDeployments - 
which we're planning to fix.)

So this seems like an opportunity: if the dependencies between services 
could be encoded in Heat templates rather than baked into the containers 
then we could use Heat as the orchestration layer following the 
dependency-based style I outlined in [1]. (TripleO is already moving in 
this direction with the way that composable-roles uses 
SoftwareDeploymentGroups.) One caveat is that fully using this style 
likely rules out for all practical purposes the current Pacemaker-based 
HA solution. We'd need to move to a lighter-weight HA solution, but I 
know that TripleO is considering that anyway.

What's more though, assuming this could be made to work for a Kubernetes 
cluster, a couple of remappings in the Heat environment file should get 
you an otherwise-equivalent single-node non-HA deployment basically for 
free. That's particularly exciting to me because there are definitely 
deployments of TripleO that need HA clustering and deployments that 
don't and which wouldn't want to pay the complexity cost of running 
Kubernetes when they don't make any real use of it.

So you'd have a Heat resource type for the controller cluster that maps 
to either an OS::Nova::Server or (the equivalent of) an OS::Magnum::Bay, 
and a bunch of software deployments that map to either a 
OS::Heat::SoftwareDeployment that calls (I assume) docker-compose 
directly or a Kubernetes Pod resource to be named later.

The first obstacle is that we'd need that Kubernetes Pod resource in 
Heat. Currently there is no such resource type, and the OpenStack API 
that would be expected to provide that API (Magnum's /container 
endpoint) is being deprecated, so that's not a long-term solution.[2] 
Some folks from the Magnum community may or may not be working on a 
separate project (which may or may not be called Higgins) to do that. 
It'd be some time away though.

An alternative, though not a good one, would be to create a Kubernetes 
resource type in Heat that has the credentials passed in somehow. I'm 
very against that though. Heat is just not good at handling credentials 
other than Keystone ones. We haven't ever created a resource type like 
this before, except for the Docker one in /contrib that serves as a 
prime example of what *not* to do. And if it doesn't make sense to wrap 
an OpenStack API around this then IMO it isn't going to make any more 
sense to wrap a Heat resource around it.

A third option might be a SoftwareDeployment, possibly on one of the 
controller nodes themselves, that calls the k8s client. (We could create 
a software deployment hook to make this easy.) That would suffer from 
all of the same issues that TripleO currently has about having to choose 
a server on which to deploy though.

The secondary obstacle is networking. TripleO has some pretty 
complicated networking requirements (specifically network isolation for 
the various services) that for now can't be supported when deploying a 
cluster with Magnum. The Kuryr project is working on improved networking 
for Magnum, but I don't know whether this is a use-case that would be 
covered.

There's also the issue that IIUC Magnum operates its Neutron L3 agents 
in such a way that connectivity to the user nodes is guaranteed only if 
Magnum itself is running in an HA cloud. This is a problematic 
assumption in general, but it's particularly problematic in the case of 
the TripleO *undercloud*, which is not HA and which we very much do not 
want to be in the networking path for the overcloud controller nodes. 
Again, I don't know if this will be resolved by Kuryr or when.

Magnum does offer the option to pass a custom template, and I assume 
that would allow us to set up the networking the way we want it. 
However, TripleO uses all kinds of tricks with the environment and 
parameters, so there'd quite likely need to be some enhancements to both 
Heat (in order to access the current environment from within a template) 
and Magnum (to pass an environment along with the template) to support that.

At that point it's a legitimate question to ask what exactly Magnum is 
buying us if TripleO has to maintain its own Kubernetes deployment 
templates anyway. I can think of only two things: an easier transition 
later if we do believe that the networking stuff will be resolved, and 
the /containers API. And the /containers API is being deprecated.

In that sense, the Magnum/Higgins split could be a good thing for the 
Heat+Kubernetes use case in the long term - if we had a 
Keystone-authenticated API that can allow Heat to make use of any k8s 
cluster, not just those deployed via Magnum, then Magnum could be cut 
out of the loop in those cases where networking issues preclude its use.

In the short term, though, there seems to be a number of obstacles. 
Perhaps some of the folks involved in the relevant projects could 
comment on when/if those are likely to be resolved.

cheers,
Zane.

[1] 
http://lists.openstack.org/pipermail/openstack-dev/2016-March/090055.html
[2]https://etherpad.openstack.org/p/newton-magnum-unified-abstraction

Open Stack

[openstack-dev] [TripleO][Kolla][Heat][Higgins][Magnum][Kuryr] Gap analysis: Heat as a k8s orchestrator

OpenStack

Community

Documentation

Branding & Legal