[openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

Jay Pipes jaypipes at gmail.com
Fri Mar 28 19:49:31 UTC 2014

On Fri, 2014-03-28 at 19:38 +0000, CARVER, PAUL wrote:
> Jay Pipes wrote: 
> >I'm proposing getting rid of the host aggregate hack (or maybe evolving
> >it?) as well as the availability zone concept and replacing them with a
> >more flexible generic container object that may be hierarchical in
> >nature.
> Is the thing you're proposing to replace them with something that already
> exists or a brand new thing you're proposing should be created?

Either an evolution of the host aggregate concept (possibly renamed) or
a brand new concept.

> We need some sort of construct that allows the tenant to be confident that
> they aren't going to lose multiple VMs simultaneously due to a failure of
> underlying hardware.

? Tenants currently assume this is the case if they are using multiple
availability zones, but there is nothing in Nova that actually prevents
multiple availability zones from sharing hardware.

Frankly, this is an SLA thing, and should not be part of the API, IMO.
If a deployer wishes to advertise an SLA that says "this container of
compute resources is a failure domain", then they should be free to make
that SLA and even include it in a description of said generic container
of compute resource, but there should be no *implicit* SLAs.

>  The semantics of it need to be easily comprehensible
> to the tenant, otherwise you'll get people thinking they're protected because
> they built a redundant pair of VMs but sheer bad luck results in them losing
> them both at the same time.

Umm, that's possible today. There is an implicit trust right now in the
API that availability zones are independent failure domains. And what I
am telling you is that no such constraint exists in the implementation
of Nova availability zones (exposed via host aggregate).

> We're using availability zone for that currently and it seems to serve the
> purpose in a way that's easy to explain to a tenant.

It may be easy to explain to a tenant -- simply because of its use in
AWS. But that doesn't mean it's something that is real in practice.
You're furthering a false trust if you explain to tenants that an
availability zone is an independent failure domain when it can easily
NOT be an independent failure domain because of the exposure of
availability zones through the host aggregate concept (which themselves
may overlap hardware and therefore spoil the promise of independent
failure domains).

Thus, we need a different concept than availability zone to expose to
users. Thus, my proposal.


More information about the OpenStack-dev mailing list