[openstack-dev] [TripleO][Heat][Kolla][Magnum] The zen of Heat, containers, and the future of TripleO

Dan Prince dprince at redhat.com
Mon Mar 28 19:02:43 UTC 2016


On Mon, 2016-03-21 at 16:14 -0400, Zane Bitter wrote:
> tl;dr Containers represent a massive, and also mandatory,
> opportunity 
> for TripleO. Lets start thinking about ways that we can take maximum 
> advantage to achieve the goals of the project.
> 
> Now that you have the tl;dr I'm going to start from the beginning,
> so 
> settle in and grab yourself a cup of coffee or other poison of your
> choice.
> 
> After working on developing Heat from the very beginning of the
> project 
> in early 2012 and debugging a bunch of TripleO deployments in the
> field, 
> it is my considered opinion that Heat is a poor fit for the
> workloads 
> that TripleO is currently asking of it. To illustrate why, I need to 
> explain what it is that Heat is really designed to do.
> 
> Here's a theoretical example of how I've always imagined Heat
> software 
> deployments would make Heat users' lives better. For simplicity, I'm 
> just going to model two software components, a user-facing service
> that 
> connects to some back-end service:
> 
>    resources:
>      backend_component:
>        type: OS::Heat::SoftwareComponent
>        properties:
>          configs:
>            - tool: script
>              actions:
>                - CREATE
>                - UPDATE
>              config: |
>                PORT=$(get_backend_port || random_port)
>                stop_backend
>                start_backend $DEPLOY_VERSION $PORT $CONFIG
>                addr="$(hostname):$(get_backend_port)"
>                printf '%s' "$addr"
> >${heat_outputs_path}.host_and_port
>            - tool: script
>              actions:
>                - DELETE
>              config: |
>                 stop_backend
>           inputs:
>             - name: DEPLOY_VERSION
>             - name: CONFIG
>           outputs:
>             - name: host_and_port
> 
>      frontend_component:
>        type: OS::Heat::SoftwareComponent
>        properties:
>          configs:
>            - tool: script
>              actions:
>                - CREATE
>                - UPDATE
>              config: |
>                stop_frontend
>                start_frontend $DEPLOY_VERSION $BACKEND_ADDR $CONFIG
>            - tool: script
>              actions:
>                - DELETE
>              config: |
>                stop_frontend
>          inputs:
>            - name: DEPLOY_VERSION
>            - name: BACKEND_ADDR
>            - name: CONFIG
> 
>      backend:
>        type: OS::Heat::SoftwareDeployment
>        properties:
>          server: {get_resource: backend_server}
>          name: {get_param: backend_version} # Forces upgrade
> replacement
>          actions: [CREATE, UPDATE, DELETE]
>          config: {get_resource: backend_component}
>          input_values:
>            DEPLOY_VERSION: ${get_param: backend_version}
>            CONFIG: ${get_param: backend_config}
> 
>      frontend:
>        type: OS::Heat::SoftwareDeployment
>        properties:
>          server: {get_resource: frontend_server}
>          name: {get_param: frontend_version} # Forces upgrade
> replacement
>          actions: [CREATE, UPDATE, DELETE]
>          config: {get_resource: frontend_component}
>          input_values:
>            DEPLOY_VERSION: ${get_param: frontend_version}
>            BACKEND_ADDR: {get_attr: [backend, host_and_port]}
>            CONFIG: ${get_param: frontend_config}
> 
> 
> This is actually quite a beautiful system, if I may say so:
> 
> - Whenever a version changes, Heat knows to update that component,
> and 
> the components can be updated independently.
> - If the backend in this example restarts on a different port, the 
> frontend is updated to point to the new port.
> - Everything is completely agnostic as to which server it is running
> on. 
> They could be running on the same server or different servers.
> - Everything is integrated with the infrastructure (not only the
> servers 
> you're deploying on and the networks and volumes connected to them,
> but 
> also things like load balancers), so everything is created at the
> right 
> time, in parallel where possible, and any errors are reported all in
> one 
> place.
> - If something requires e.g. a restart after changing another
> component, 
> we can encode that. And if it doesn't, we can encode that too.
> - There's next to no downtime required: if e.g. we upgrade the
> backend, 
> we first deploy a new one listening on a new port, then update the 
> frontend to listen on the new port, then finally shut down the old 
> backend. Again, we can choose when we want this and when we just want
> to 
> update in place and reload.
> - The application doesn't even need to worry about versioning the 
> protocol that its two constituent parts communicate over: as long as
> the 
> backend_version and frontend_version that we pass are always
> compatible, 
> only compatible versions of the two services ever talk to each other.
> - If anything at all fails at any point before, during or after this 
> part of the template, Heat can automatically roll everything back
> into 
> the exact same state as it was in before, without any outside 
> intervention. You can insert test deployments that check everything
> is 
> working and have them automatically roll back if it's not, all with
> no 
> downtime for users.
> 
> So you can use this to do something like a fancier version of blue-
> green 
> deployment,[1] where you're actually rolling out the (virtualised) 
> hardware and infrastructure in a blue-green fashion along with the 
> software. Not only that, you can choose to replace your whole stack
> or 
> only parts of it. (Note: the way I had to encode this in the example 
> above, by changing the deployment name so that it forces a resource 
> replacement, is a hack. We really need a feature to specify in a 
> software config resource which inputs should result in a replacement
> on 
> change.)
> 
> It's worth noting that in practice you really, really want
> everything 
> deployed in containers to make this process work consistently, even 
> though *in theory* you could make this work (briefly) without them.
> In 
> particular, rollback without containers is a dicey proposition. When
> we 
> first started talking about implementing software deployments in Heat
>> half-seriously suggested that maybe we should make containers the
> only 
> allowed type of software deployment, and I kind of wonder now if I 
> shouldn't have pressed harder on that point.
> 
> 
> In any event, unfortunately as everyone involved in TripleO knows,
> the 
> way TripleO uses Heat looks nothing like this. It actually looks
> more 
> like this:
> 
>    resources:
>      install_all_the_things_on_one_server_config:
>        type: OS::Heat::SoftwareConfig
>        properties:
>          actions: [CREATE]
>          config: {get_file: install_all_the_things_on_one_server.sh}
> 
>      update_all_the_things_on_one_server_config:
>        type: OS::Heat::SoftwareConfig
>        properties:
>          actions: [UPDATE]
>          config: {get_file: update_all_the_things_on_one_server.sh}
>          inputs:
>            - name: update_count
> 
>      ...
> 
> (Filling in the rest is left as an exercise to the reader. You're
> welcome.)
> 
> Not illustrated are the multiple sources of truth that we have:
> puppet 
> modules (packaged on the server), puppet manifests and hieradata 
> (delivered via Heat), external package repositories. Heat is a
> dataflow 
> language but much of the data it should be operating on is actually 
> hidden from it. That's going about as well as you might expect.
> 
> Due to the impossibility of ever rolling back a deployment like one
> of 
> those, we just disable rollback for the overcloud templates, so if 
> there's a failure we end up stuck in whatever intermediate state we
> were 
> in when the script died. That can leave things in an state where 
> recovery is not automatic when 'earlier' deployments (like the
> package 
> update) end up depending on state set up by 'later' deployments
> (like 
> the post- scripts, which manipulate Pacemaker's state in Pacemaker-
> based 
> deployments). Even worse, many of the current scripts leave the
> machine 
> in a state that requires manual recovery should they fail part-way
> through.
> 
> Indeed, this has literally none of the benefits of the ideal Heat 
> deployment enumerated above save one: it may be entirely the wrong
> tool 
> in every way for the job it's being asked to do, but at least it is 
> still well-integrated with the rest of the infrastructure.
> 
> Now, at the Mitaka summit we discussed the idea of a 'split stack', 
> where we have one stack for the infrastructure and a separate one
> for 
> the software deployments, so that there is no longer any tight 
> integration between infrastructure and software. Although it makes me
>> bit sad in some ways, I can certainly appreciate the merits of the
> idea 
> as well. However, from the argument above we can deduce that if this
> is 
> the *only* thing we do then we will end up in the very worst of all 
> possible worlds: the wrong tool for the job, poorly integrated.
> Every 
> single advantage of using Heat to deploy software will have
> evaporated, 
> leaving only disadvantages.
> 
> So what would be a good alternative? And how would we evaluate the
> options?
> 
> 
> To my mind, the purpose of the TripleO project is this: to ensure
> that 
> there is an OpenStack community collaborating around each part of
> the 
> OpenStack installation/management story. We don't care about TripleO 
> "owning" that part (all things being equal, we'd prefer not to),
> just 
> that nobody should have to go outside the OpenStack community and/or 
> roll their own thing to install OpenStack unless they want to. So I 
> think the ability to sustain a community around whatever solution we 
> choose ought to be a primary consideration.
> 
> The use of Ironic has been something of a success story here.
> There's 
> only one place to add hardware support to enable both installing 
> OpenStack itself on bare-metal via TripleO and the 'regular' 
> bare-metal-to-tenant use case of Ironic. This is a clear win/win.
> 
> Beyond getting the bare-metal machines marshalled, the other part of
> the 
> solution is configuration management and orchestration of the
> various 
> software services. When TripleO started there was nowhere in
> OpenStack 
> that was defining the relationships between services needed to 
> orchestrate them. To a large extent there still isn't. I think that
> one 
> of the reasons we adopted Puppet in TripleO was that it was supposed
> to 
> provide this, at least within a limited scope (i.e. on one machine -
> the 
> puppet-deploying community is largely using Ansible to orchestrate 
> across boxes, and we are using Heat). However, what we've discovered
> in 
> the past few months is that Puppet is actually not able to fulfil
> this 
> role as long as we support Pacemaker-based deployments as an option, 
> because in that case Pacemaker actually has control of starting and 
> stopping all of the services. As a result we are back to defining it
> all 
> ourselves in the Pacemaker config plus various hacky shell scripts, 
> instead of relying on (and contributing to!) a larger community.
> Even 
> ignoring that, Puppet doesn't solve the problem of orchestrating
> across 
> multiple machines.

I agree that the pacemaker vs puppet relationships didn't play out as
nicely as they could have. One possible light at the end of this tunnel
is that there is interest in eventually using a "pacemaker light"
approach to HA. I think this largely means using Pacemaker to manage
the active/passive services (Galera and perhaps Cinder) and perhaps
just using Keepalived for the rest.

That said regardless of what we eventually do with Pacemaker or Puppet
it should be feasible for them both to co-exist. I think the lacking
peices here are around workflow could be significantly improved if we
defined some pluggable upgrade steps in Mistral workflows (API driven)
that could allow you to call into ansible playbooks, shell, or whatever
workflow'y stuff you'd like in a per service model. What we've got is a
monolythic script that works only for TripleO. What I would rather see
us end up with is an architecture that supports plugging into the best
tooling for the job to implement the upgrade parts we need for our
services... and that perhaps is something you could build a community
around.

> 
> 
> Clearly one option would be to encode everything in Heat along the
> lines 
> of the first example above. I think once we have containers this
> could 
> actually work really well for compute nodes and other types of scale-
> out 
> nodes (e.g. Swift nodes). The scale-out model of Heat scaling groups 
> works really well for this use case, and between the improvements we 
> have put in place (like batched updates and user hooks) and those
> still 
> on the agenda (like notifications + automatic Mistral workflow 
> triggering on hooks) Heat could provide a really good way of
> capturing 
> things like migrating user workloads on scale down and rolling
> updates 
> in the templates, so that they can be managed completely
> automatically 
> by the undercloud with no client involvement (and when the
> undercloud 
> becomes HA, they'll get HA for free). I'd be pretty excited to see
> this 
> tried. The potential downside is that the orchestration definitions
> are 
> still trapped inside the TripleO templates, so they're not being
> shared 
> outside of the TripleO community. This is probably justified though 
> owing to its close ties to the underlying infrastructure.

Having Mistral exec a shared set of Ansible playbooks that help us
drive the upgrades process would be cool. Perhaps even ideal in that we
could make it into something of a pluggable upgrades architecture. I'd
much prefer that TripleO provide an "upgrades architecture" or at least
iterate towards defining what could be called that.

> 
> An alternative out of left field: as far as I can gather the
> "completely 
> new way of orchestrating activities" used by the new Puppet
> Application 
> Orchestration thing[2] uses substantially the same model as I
> described 
> for Heat above. If we added Puppet Application Orchestration data to 
> openstack-puppet-modules then it may be possible to write a tool to 
> generate Heat templates from that data. However in talking with
> Emilien 
> it sounds like o-p-m is quite some time away from tackling PAO. So I 
> don't think this is really feasible.
> 
> In any event, it's when we get to the controller nodes that the 
> downsides become more pronounced. We're no longer talking about one 
> deployment per service like I sketched above; each service is
> actually 
> multiple deployments forming an active-active cluster with virtual
> IPs 
> and failover and all that jazz. It may be that everything would just 
> work the same way, but we would be in uncharted territory and there 
> would likely be unanticipated subtleties. It's particularly unclear
> how 
> we would handle stop-the-world database migrations in this model, 
> although we do have the option of hoping that stop-the-world
> database 
> migrations will have been completely phased out by then.
> 
> To make it even more complicated, we ultimately want the services to 
> heterogeneously spread among controller nodes in a configurable way.
>> believe that Dan's work on composable roles has already gone some
> way 
> toward this without even using containers, but it's likely to become 
> increasingly difficult to model in Heat without some sort of
> template 
> generation. (I personally think that template generation would be a
> Good 
> Thing, but we've chosen not to go down that path so far.) Quite
> possibly 
> even just having composable roles could make it untenable to
> continue 
> maintaining separate Pacemaker and non-Pacemaker deployment modes.
> It'd 
> be really nice to have the flexibility to do things like scale out 
> different services at different rates. What's more, we are going to
> need 
> some way of redistributing services when a machine in the cluster
> fails, 
> and ultimately we would like that process to be automated, which
> would 
> *require* a template generation service.

FWIW the composable services work is coming along quite nicely. It does
split out the pacemaker vs. non-pacemaker bits:

https://review.openstack.org/#/c/295588/

The "interface" for each service currently consists of just hiera
settings, and a list of puppet manifests. The next step in defining the
services interfaces would be to add options for each service to control
its:
 -custom upgrade steps
 -container configuration: if containers are enabled this would drive
(today) docker-compose to setup containers on each role. This could
morph into an interface that dovetails into some other container uber
management tooling... perhaps still driven by Heat and/or Mistral. The
nice thing being that because our tooling is OpenStack based these
API's can interact with each other very nicely.

> 
> We certainly *could* build all of that. But we definitely shouldn't 
> because this is the kind of thing that services like Kubernetes and 
> Apache Mesos are designed to do already. And that raises another 
> possibility: Angus & friends are working on capturing the
> orchestration 
> relationships for Mesos+Marathon within the Kolla project
> (specifically, 
> in the kolla-mesos repository). This represents a tremendous
> opportunity 
> for the TripleO project to further its mission of having the same 
> deployment tools available to everyone as an official part of the 
> OpenStack project without having to maintain them separately.
> 
> As of the Liberty release, Magnum now supports provisioning Mesos 
> clusters, so TripleO wouldn't have to maintain the installer for
> that 
> either. (The choice of Mesos is somewhat unfortunate in our case, 
> because Magnum's Kubernetes support is much more mature than its
> Mesos 
> support, and because the reasons for the decision are about to be or 
> have already been overtaken by events - I've heard reports that the 
> features that Kubernetes was missing to allow it to be used for 
> controller nodes, and maybe even compute nodes, are now available. 
> Nonetheless, I expect the level of Magnum support for Mesos is
> likely 
> workable.) This is where the TripleO strategy of using OpenStack to 
> deploy OpenStack can really pay dividends: because we use Ironic all
> of 
> our servers are accessible through the Nova API, so in theory we can 
> just run Magnum out of the box.
> 
> 
> The chances of me personally having time to prototype this are 
> slim-to-zero, but I think this is a path worth investigating.
> 
> cheers,
> Zane.
> 
> 
> [1] http://martinfowler.com/bliki/BlueGreenDeployment.html
> [2] https://puppetlabs.com/introducing-puppet-application-orchestrati
> on
> 
> _____________________________________________________________________
> _____
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubs
> cribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list