[openstack-dev] [Heat] Multi region support for Heat

Clint Byrum clint at fewbar.com
Mon Jul 29 18:42:57 UTC 2013

Excerpts from Zane Bitter's message of 2013-07-29 10:21:29 -0700:
> On 29/07/13 17:40, Clint Byrum wrote:
> > Excerpts from Zane Bitter's message of 2013-07-29 00:51:38 -0700:
> >> >On 29/07/13 02:04, Angus Salkeld wrote:
> >>> > >On 26/07/13 09:43 -0700, Clint Byrum wrote:
> >>>> > >>
> >>>> > >>These are all predictable limitations and can be handled at the parsing
> >>>> > >>level.  You will know as soon as you have template + params whether or
> >>>> > >>not that cinder volume in region A can be attached to the nova server
> >>>> > >>in region B.
> >> >
> >> >That's true, but IMO it's even better if it's obvious at the time you
> >> >are writing the template. e.g. if (as is currently the case) there is no
> >> >mechanism within a template to select a region for each resource, then
> >> >it's obvious you have to write separate templates for each region (and
> >> >combine them somehow).
> >> >
> > The language doesn't need to be training wheels for template writers. The
> Sure it does ;)
> But seriously, the language should reflect the underlying model.

And therein lies the debate, what does Heat model? If part of it is
"resources" then the language needs to be really open to new resources that
may do things not conceived of today.

> > language parser should just make it obvious when you've done something
> > patently impossible. Because cinderclient will only work with a region
> > at a time, we can make that presumption at that resource level, and flag
> > the problem during creation before any resources have been allocated.
> >
> > But it would be quite presumptuous of Heat, at a language layer, to
> > say all things are single region.
> Well, CloudFormation has made that presumption. And while I would 
> *never* suggest that we limit ourselves to features that CloudFormation 
> supports, it behooves us to pause and consider why that might be. (I'm 
> thinking here of a great many things in Heat, not just this particular 
> one.) Because if we do just charge ahead with every idea, the complexity 
> of the resulting system will be baroque.

Indeed, I am actually most worried about the complexity of solutions that
will be built around Heat if we do not define how it models regions or
if we do not include the inherent isolation a region implies in that

> > There's an entirely good use case
> > for cross-region volumes and if it is ever implemented that way, and
> > cinderclient got some multi-region features, we wouldn't want users
> > to be faced with a big rewrite of their templates just to make use of
> > them. We should just be able to do the cross region thing now, because
> > we have that capability.
> This is interesting, because I think of Availability Zones within a 
> Region as "things that are separated somewhat, but still make sense to 
> be used together sometimes" and Regions as "things that don't make sense 
> to be used together". So we have this three-tier system already (local, 
> different AZ, different Region) that cloud providers can use to specify 
> the capabilities of resources, yet we propose to make two of those tiers 
> effectively equivalent? That doesn't sound like we are effectively 
> modelling the problem domain.
> If we accept that distinction between AZs and Regions, then your example 
> is, by definition, not going to happen. Presumably you don't accept that 
> definition though, so I'd be curious to hear what everybody else thinks 
> it means when a cloud provider creates a separate Region.

I see AZ as resources that share _some_ infrastructure (such as a
keystone catalog). Regions are completely isolated from one another at
the infrastructure level, though they may have eventually-consistent
shared data (like a username/password db).

> > I dislike the language defining everything and putting users in too
> > tight of a box. How many times have you been using a DSL and shaken your
> > fists in the air, "get out of my way!"? I would suggest that cross region
> > support is handled resource by resource.
> Probably only because I haven't used a lot of DSLs but, honestly, I 
> genuinely can't ever recall that happening. However, I wish I had a 
> dollar for every time I've tried desperately to get some combination of 
> things that worked individually to work together and ended up having to 
> read the source code to figure out that it wasn't supported.

Right, and good early errors and documentation are something I think we
are not so good at right now.

> >>>> > >>I'm still convinced that none of this matters if you rely on a single
> >>>> > >>Heat
> >>>> > >>in one of the regions. The whole point of multi region is to eliminate
> >>>> > >>a SPOF.
> >> >
> >> >So the idea here would be that you spin up a master template in one
> >> >region, and this would contain OS::Heat::Stack resources that use
> >> >python-heatclient to connect to Heat APIs in other regions to spin up
> >> >the constituent stacks in each region. If a region goes down, even if it
> >> >is the one with your master template, that's no problem because you can
> >> >still interact with the constituent stacks directly in whatever
> >> >region(s) you_can_  reach.
> >> >
> > Interacting with a nested stack directly would be a very uncomfortable
> > proposition. How would this affect changes from the master? Would the
> > master just overwrite any changes I make or refuse to operate?
> Heat resources (with the exception of a few hacks that are implemented 
> internally *cough*AutoScaling*cough*) are stateless. We only really 
> store the uuid of the target resource.
> So if you updated the template for the nested stack directly, the Heat 
> engine with the master stack wouldn't notice or care. If you deleted it 
> then you should probably delete it from the master stack before you try 
> to do an update that modifies it, but that should go smoothly because 
> Heat ignores NotFound errors when trying to delete resources. We can and 
> probably should improve on that by checking that resources still exist 
> during an update, and recreating them if not.

Oh yes please, I actually started on a patch to do exactly that as it
would make Heat immensely more robust. This single idea probably deserves
a blueprint.

> Note that this is completely possible now, for all resources that have 
> their own API + nested stacks as well. (Updates after out-of-band 
> changes are actually a much bigger problem for virtually everything 
> _but_ nested stacks, which is one reason I suggested limiting 
> foreign-region resources to only nested stacks, rather than the 
> competing proposal from others in this thread to allow it for all 
> resource types.) This proposal doesn't change anything at all in this 
> regard, aside from increasing the chance that you might want to make use 
> of it.
> >> >So it solves the non-obviousness problem and the single-point-of-failure
> >> >problem in one fell swoop. The question for me is whether there are
> >> >legitimate use cases that this would shut out.
> >> >
> > With this solution to those problems, you have a new non-obvious problem
> > which is the split brain I describe above. If master goes down, you have
> > to play the master's role, and when the master returns, does it just
> > give up because you've tainted its stacks? Do you develop new API commands
> > to help resolve this? Use merge techniques?
> Other than the "replace a deleted resource on update" thing mentioned 
> above, nothing should be necessary afaict.
> > I believe that those problems are no less complex than the problem of
> > how to make Heat engines simply act as peers that need to share data.
> Looking back over your proposal, you seem to be suggesting more or less 
> the same thing happening under the hood. The only difference I can 
> detect is that I want the structure of the templates to explicitly model 
> how it works, whereas you want to infer from one multi-region template 
> which resources go to each region.

Agreed, we're not far off.

However, I don't want there to be a SPOF. If there is a "master" template
that just has nested stacks with regions in it somewhere, that is a SPOF.
It is a SPOF because it will facilitate the information sharing between
the stacks in various regions, and thus is critical to rolling out
changes when there is a failure somewhere.

If the regional Heat engines are peers, and the templates are synchronized
to the regions that are dictated by the template+params, then when one
region goes down, the others still have everything they need to adapt to
the problem. The admin can update the original template with a new region
for all of the unavailable resources, or just delete them, or create
new ones in the surviving region. When the original failed region comes
back online, it will be left in a state of conflict with the surviving
region's template, and thus be unable to share data with the surviving
region until a resolution can be found by the admin, such as perhaps the
admin manually pushing the new template to the previously failed region.

I don't find this to be complex. This is fairly standard high availability
split brain handling behavior. However, I need to once again step
back from the conversation as I don't believe I will be working on an
implementation any time soon.

More information about the OpenStack-dev mailing list