[openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..
philip.day at hp.com
Thu Mar 27 18:28:55 UTC 2014
> -----Original Message-----
> From: Vishvananda Ishaya [mailto:vishvananda at gmail.com]
> Sent: 26 March 2014 20:33
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova][scheduler] Availability Zones and Host
> On Mar 26, 2014, at 11:40 AM, Jay Pipes <jaypipes at gmail.com> wrote:
> > On Wed, 2014-03-26 at 09:47 -0700, Vishvananda Ishaya wrote:
> >> Personally I view this as a bug. There is no reason why we shouldn't
> >> support arbitrary grouping of zones. I know there is at least one
> >> problem with zones that overlap regarding displaying them properly:
> >> https://bugs.launchpad.net/nova/+bug/1277230
> >> There is probably a related issue that is causing the error you see
> >> below. IMO both of these should be fixed. I also think adding a
> >> compute node to two different aggregates with azs should be allowed.
> >> It also might be nice to support specifying multiple zones in the
> >> launch command in these models. This would allow you to limit booting
> >> to an intersection of two overlapping zones.
> >> A few examples where these ideas would be useful:
> >> 1. You have 3 racks of servers and half of the nodes from each rack
> >> plugged into a different switch. You want to be able to specify to
> >> spread across racks or switches via an AZ. In this model you could
> >> have a zone for each switch and a zone for each rack.
> >> 2. A single cloud has 5 racks in one room in the datacenter and 5
> >> racks in a second room. You'd like to give control to the user to
> >> choose the room or choose the rack. In this model you would have one
> >> zone for each room, and smaller zones for each rack.
> >> 3. You have a small 3 rack cloud and would like to ensure that your
> >> production workloads don't run on the same machines as your dev
> >> workloads, but you also want to use zones spread workloads across the
> >> three racks. Similarly to 1., you could split your racks in half via
> >> dev and prod zones. Each one of these zones would overlap with a rack
> >> zone.
> >> You can achieve similar results in these situations by making small
> >> zones (switch1-rack1 switch1-rack2 switch1-rack3 switch2-rack1
> >> switch2-rack2 switch2-rack3) but that removes the ability to decide
> >> to launch something with less granularity. I.e. you can't just
> >> specify 'switch1' or 'rack1' or 'anywhere'
> >> I'd like to see all of the following work nova boot ... (boot anywhere)
> >> nova boot -availability-zone switch1 ... (boot it switch1 zone) nova
> >> boot -availability-zone rack1 ... (boot in rack1 zone) nova boot
> >> -availability-zone switch1,rack1 ... (boot
> > Personally, I feel it is a mistake to continue to use the Amazon
> > concept of an availability zone in OpenStack, as it brings with it the
> > connotation from AWS EC2 that each zone is an independent failure
> > domain. This characteristic of EC2 availability zones is not enforced
> > in OpenStack Nova or Cinder, and therefore creates a false expectation
> > for Nova users.
> > In addition to the above problem with incongruent expectations, the
> > other problem with Nova's use of the EC2 availability zone concept is
> > that availability zones are not hierarchical -- due to the fact that
> > EC2 AZs are independent failure domains. Not having the possibility of
> > structuring AZs hierarchically limits the ways in which Nova may be
> > deployed -- just see the cells API for the manifestation of this
> > problem.
> > I would love it if the next version of the Nova and Cinder APIs would
> > drop the concept of an EC2 availability zone and introduce the concept
> > of a generic region structure that can be infinitely hierarchical in
> > nature. This would enable all of Vish's nova boot commands above in an
> > even simpler fashion. For example:
> > Assume a simple region hierarchy like so:
> > regionA
> > / \
> > regionB regionC
> > # User wants to boot in region B
> > nova boot --region regionB
> > # User wants to boot in either region B or region C nova boot --region
> > regionA
> I think the overlapping zones allows for this and also enables additional use
> cases as mentioned in my earlier email. Hierarchical doesn't work for the
> rack/switch model. I'm definitely +1 on breaking from the amazon usage of
> availability zones but I'm a bit leery to add another parameter to the create
> request. It is also unfortunate that region already has a meaning in the
> amazon world which will add confusion.
Ok, got far enough back down my stack to understand the drive here, and I kind of understand the use case, but I think what's missing is that currently we only allow for one group of availability zones.
I can see why you would want them to overlap in a certain way - i.e. a "rack based" zone could overlap with a "switch based" zone - but I still don't want any overlap within the set of "switch based" zones, or any overlap within the set of "rack based" zones.
Maybe the issue is that when we converted / mapped AZs onto aggregates we only ever considered that there would be one such set of mutually exclusive aggregates.
If instead we could give each aggregate a group name, and specify if that is an exclusive or non-exclusive aggregate (maybe just having a group name would be enough to make it exclusive), then you could define:
Aggregate name=rack_1 group="rack" servers="a,b,c,d"
Aggregate name=rack_2 group="rack" servers="e,f,g,h"
Aggregate name=sw_1 group="switch" servers="a,b,g,h"
Aggregate name=sw_2 group="switch" servers="e.f,c,d"
That way we still provide all the good protection that stops an admin from accidentally adding a server into two aggregates in the same group, but allow them to set up overlaps between groups.
If we treat the group which represents AZs as the default group, then in effect we would keep all of the current schematics that people know and hate about "AZs", but allow options such as
nova boot --availability-zone az1 --availability-zone rack:rack_1 --availability-zone switch:sw_1
while still maintaining backwards compatibility.
Personally I'm a bit worried about users having too fine a granularity over where they place a sever - AZs are generally few and big so you can afford to allow this and not have capacity issues, but if I had to expose 40 different rack based zones it would be pretty hard to stop everyone pilling into the first or last - when really want they want to say is "not the same as" or "the same as" - which makes me wonder if this is really the right way to go. It feels more like what we really want is some form of affinity and anti-affinity rules rather than an explicit choice of a particular group.
More information about the OpenStack-dev