[openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

Day, Phil philip.day at hp.com
Fri Mar 28 11:01:09 UTC 2014

>> Personally, I feel it is a mistake to continue to use the Amazon concept
>> of an availability zone in OpenStack, as it brings with it the
>> connotation from AWS EC2 that each zone is an independent failure
>> domain. This characteristic of EC2 availability zones is not enforced in
>> OpenStack Nova or Cinder, and therefore creates a false expectation for
>> Nova users.

>I think this is backwards training, personally. I think azs as separate failure
>domains were done like that for a reason by amazon, and make good sense. 
>What we've done is overload that with cells, aggregates etc which should 
>have a better interface and are a different concept. Redefining well understood 
>terms because they don't suite your current implementation is a slippery slope, 
>and overloading terms that already have a meaning in the industry in just annoying.

I don't think there is anything wrong with identifying new use cases and working out how to cope with them:

 - First we generalized Aggregates
- Then we mapped AZs onto aggregates as a special mutually exclusive group
- Now we're recognizing that maybe we need to make those changes to support AZs more generic so we can create additional groups of mutually exclusive aggregates

That all feels like good evolution.

But I don't see why that means we have to fit that in under the existing concept of AZs - why can't we keep AZs as they are and have a better thing called Zones that is just an OSAPI concept and is better that AZs ?    Arguments around not wanting to add new options to create server seem a bit weak to me - for sure we don't want to add them in an uncontrolled way, but if we have a new, richer, concept we should be able to express that separately.

I'm still not personally convinced by the need use cases of racks having orthogonal power failure domains and switch failure domains - that seems to me from a practical perspective that it becomes really hard to work out where to separate VMs so that they don't share a failure mode.    Every physical DC design I've been involved with tries to get the different failure domains to align.   However if it the use case makes sense to someone then I'm not against extending aggregates to support multiple mutually exclusive groups.

I think I see a Design Summit session emerging here


More information about the OpenStack-dev mailing list