[openstack-dev] [Heat] Short term scaling strategies for large Heat stacks

Clint Byrum clint at fewbar.com
Thu May 29 23:52:07 UTC 2014


Hello!

I am writing to get some brainstorming started on how we might mitigate
some of the issues we've seen while deploying large stacks on Heat. I am
sending this to the dev list because it may involve landing fixes rather
than just using different strategies. The problems outlined here are
well known and reported as bugs or feature requests, but there may be
more that we can do.

First off, we've pushed Heat quite a bit further than it ever was able
to go before. This is due to the fantastic work done across the Heat
development community to respond to issues we've reported. We are really
excited to get started on the effort to move Heat toward a convergence
model, but it is quite clearly a medium-term strategy. What we are
looking for is a short term bridge to get us through the near-term
problems while code lands to fix things in the mid-term.

We have a desire to deploy a fairly wide stack of servers. OpenStack's
simplest architecture has a few "controllers" which keep state and need
to be HA, and then lots of compute nodes.

What we've seen is that while deploying a cluster with a single controller
and "n" compute nodes, the probability of stack failure goes up as n
goes up. So we want to mitigate the impact and enable a deployer to
manage a cluster like this with Heat.

We have also seen that the single thread that must manage an action will
take quite a lot of CPU power to process a large stack, which makes
operations on a large stack take a long time and thus increases the
impact of any changes that must be made.

Strategies:

Abandon + Adopt
===============

In this strategy, a failure will be responded to by abandoning the stack
in Heat, leaving the successful resources in place. Then the resulting
abandon serialization will be editted to match reality, and the stack
adopted. This suffers from a bug where in-instance users created inside
the stack, while still valid, will not be given access to the metadata.
fix the bugs in abandon/adopt to make sure this works.

Pros: * Exists today

Cons: * Bugs must be fixed
      * Manual process is undefined and requires engineering effort to
        recover.

Multiple Stacks
===============

We could break the stack up between controllers, and compute nodes. The
controller will be less likely to fail because it will probably be 3 nodes
for a reasonably sized cloud. The compute nodes would then live in their
own stack of (n) nodes. We could further break that up into chunks of
compute nodes, which would further mitigate failure. If a small chunk of
compute nodes fails, we can just migrate off of them. One challenge here
is that compute nodes need to know about all of the other compute nodes
to support live migration. We would have to do a second stack update after
creation to share data between all of these stacks to make this work.

Pros: * Exists today

Cons: * Complicates host awareness
      * Still vulnerable to stack failure (just reduces probability and
        impact).

Manual State Manipulation
=========================

We could create tools for administrators to go into the Heat database
and "fix" the stack. This is basically the same approach as
abandon/adopt, but it is lighter weight and works around the issue of
losing track of in-instance users.

Pros: * Light weight
      * Possible today

Cons: * Violates API layers
      * Requires out of band access to Heat data store.
      * Will not survive database schema changes

update-failure-recovery
=======================

This is a blueprint I believe Zane is working on to land in Juno. It will
allow us to retry a failed create or update action. Combined with the
separate controller/compute node strategy, this may be our best option,
but it is unclear whether that code will be available soon or not. The
chunking is definitely required, because with 500 compute nodes, if
node #250 fails, the remaining 249 nodes that are IN_PROGRESS will be
cancelled, which makes the impact of a transient failure quite extreme.
Also without chunking, we'll suffer from some of the performance
problems we've seen where a single engine process will have to do all of
the work to bring up a stack.

Pros: * Uses blessed strategy

Cons: * Implementation is not complete
      * Still suffers from heavy impact of failure
      * Requires chunking to be feasible


Anyway, these are the strategies I have available today. Does anyone
else have some ideas to help us make use of the current Heat to deploy
large stacks? Thanks!



More information about the OpenStack-dev mailing list