[openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

Tim Bell Tim.Bell at cern.ch
Thu Sep 24 16:13:23 UTC 2015


> -----Original Message-----
> From: Matt Riedemann [mailto:mriedem at linux.vnet.ibm.com]
> Sent: 24 September 2015 16:59
> To: openstack-dev at lists.openstack.org
> Subject: Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?
> 
> 
> 
> On 9/24/2015 9:06 AM, Matt Riedemann wrote:
> >
> >
> > On 9/24/2015 3:19 AM, Sylvain Bauza wrote:
> >>
> >>
> >> Le 24/09/2015 09:04, Duncan Thomas a écrit :
> >>> Hi
> >>>
> >>> I thought I was late on this thread, but looking at the time stamps,
> >>> it is just something that escalated very quickly. I am honestly
> >>> surprised an cross-project interaction option went from 'we don't
> >>> seem to understand this' to 'deprecation merged' in 4 hours, with
> >>> only a 12 hour discussion on the mailing list, right at the end of a
> >>> cycle when we're supposed to be stabilising features.
> >>>
> >>
> >> So, I agree it was maybe a bit too quick hence the revert. That said,
> >> Nova master is now Mitaka, which means that the deprecation change
> >> was provided for the next cycle, not the one currently stabilising.
> >>
> >> Anyway, I'm really all up with discussing why Cinder needs to know
> >> the Nova AZs.
> >>
> >>> I proposed a session at the Tokyo summit for a discussion of Cinder
> >>> AZs, since there was clear confusion about what they are intended
> >>> for and how they should be configured.
> >>
> >> Cool, count me in from the Nova standpoint.
> >>
> >>> Since then I've reached out to and gotten good feedback from, a
> >>> number of operators. There are two distinct configurations for AZ
> >>> behaviour in cinder, and both sort-of worked until very recently.
> >>>
> >>> 1) No AZs in cinder
> >>> This is the config where a single 'blob' of storage (most of the
> >>> operators who responded so far are using Ceph, though that isn't
> >>> required). The storage takes care of availability concerns, and any
> >>> AZ info from nova should just be ignored.
> >>>
> >>> 2) Cinder AZs map to Nova AZs
> >>> In this case, some combination of storage / networking / etc couples
> >>> storage to nova AZs. It is may be that an AZ is used as a unit of
> >>> scaling, or it could be a real storage failure domain. Eitehr way,
> >>> there are a number of operators who have this configuration and want
> >>> to keep it. Storage can certainly have a failure domain, and
> >>> limiting the scalability problem of storage to a single cmpute AZ
> >>> can have definite advantages in failure scenarios. These people do
> >>> not want cross-az attach.
> >>>
> >>
> >> Ahem, Nova AZs are not failure domains - I mean the current
> >> implementation, in the sense of many people understand what is a
> >> failure domain, ie. a physical unit of machines (a bay, a room, a
> >> floor, a datacenter).
> >> All the AZs in Nova share the same controlplane with the same message
> >> queue and database, which means that one failure can be propagated to
> >> the other AZ.
> >>
> >> To be honest, there is one very specific usecase where AZs *are*
> >> failure domains : when cells exact match with AZs (ie. one AZ
> >> grouping all the hosts behind one cell). That's the very specific
> >> usecase that Sam is mentioning in his email, and I certainly understand
we
> need to keep that.
> >>
> >> What are AZs in Nova is pretty well explained in a quite old blogpost :
> >> http://blog.russellbryant.net/2013/05/21/availability-zones-and-host-
> >> aggregates-in-openstack-compute-nova/
> >>
> >>
> >> We also added a few comments in our developer doc here
> >> http://docs.openstack.org/developer/nova/aggregates.html#availability
> >> -zones-azs
> >>
> >>
> >> tl;dr: AZs are aggregate metadata that makes those aggregates of
> >> compute nodes visible to the users. Nothing more than that, no magic
> sauce.
> >> That's just a logical abstraction that can be mapping your physical
> >> deployment, but like I said, which would share the same bus and DB.
> >> Of course, you could still provide networks distinct between AZs but
> >> that just gives you the L2 isolation, not the real failure domain in
> >> a Business Continuity Plan way.
> >>
> >> What puzzles me is how Cinder is managing a datacenter-level of
> >> isolation given there is no cells concept AFAIK. I assume that
> >> cinder-volumes are belonging to a specific datacenter but how is
> >> managed the controlplane of it ? I can certainly understand the need
> >> of affinity placement between physical units, but I'm missing that
> >> piece, and consequently I wonder why Nova need to provide AZs to
> >> Cinder on a general case.
> >>
> >>
> >>
> >>> My hope at the summit session was to agree these two configurations,
> >>> discuss any scenarios not covered by these two configuration, and
> >>> nail down the changes we need to get these to work properly. There's
> >>> definitely been interest and activity in the operator community in
> >>> making nova and cinder AZs interact, and every desired interaction
> >>> I've gotten details about so far matches one of the above models.
> >>>
> >>
> >> I'm all with you about providing a way for users to get volume
> >> affinity for Nova. That's a long story I'm trying to consider and we
> >> are constantly trying to improve the nova scheduler interfaces so
> >> that other projects could provide resources to the nova scheduler for
> >> decision making. I just want to consider whether AZs are the best
> >> concept for that or we should do thing by other ways (again, because
> >> AZs are not what people expect).
> >>
> >> Again, count me in for the Cinder session, and just lemme know when
> >> the session is planned so I could attend it.
> >>
> >> -Sylvain
> >>
> >>
> >>>
> >>>
> __________________________________________________________
> __________
> >>> ______
> >>>
> >>> OpenStack Development Mailing List (not for usage questions)
> >>> Unsubscribe:OpenStack-dev-
> request at lists.openstack.org?subject:unsubs
> >>> cribe
> >>>
> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >>
> >>
> >>
> __________________________________________________________
> ___________
> >> _____
> >>
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe:
> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >
> > I plan on reverting the deprecation change (which was a mitaka change,
> > not a liberty change, as Sylvain pointed out).
> >
> > However, given how many nova and cinder cores were talking about this
> > yesterday and thought it was the right thing to do speaks to the fact
> > that this is not a well understood use case (or documented at all).
> > So as part of reverting the deprecation I also want to see improved
> > docs for the cross_az_attach option itself and probably a nova devref
> > change explaining the use cases and issues with this.
> >
> > I think the volume attach case is pretty straightforward.  You create
> > a nova instance in some nova AZ x and create a cinder volume in some
> > cinder AZ y and try to attach the volume to the server instance.  If
> > cinder.cross_az_attach=True this is OK, else it fails.
> >
> > The problem I have is with the boot from volume case where
> > source=(blank/image/snapshot).  In those cases nova is creating the
> > volume and passing the server instance AZ to the volume create API.
> > How are people that are using cinder.cross_az_attach=False handling
> > the BFV case?
> >
> > Per bug 1496235 that started this, the user is booting a nova instance
> > in a nova AZ with bdm source=image and when nova tries to create the
> > volume it fails because that AZ doesn't exist in cinder.  This fails
> > in the compute manager when building the instance, so this results in
> > a NoValidHost error for the user - which we all know and love as a
> > super useful error.  So how do we handle this case?  If
> > cinder.cross_az_attach=True in nova we could just not pass the
> > instance AZ to the volume create, or only pass it if cinder has that AZ
> available.
> >
> > But if cinder.cross_az_attach=False when creating the volume, what do
> > we do?  I guess we can just leave the code as-is and if the AZ isn't
> > in cinder (or your admin hasn't set
> > allow_availability_zone_fallback=True
> > in cinder.conf), then it fails and you open a support ticket.  That
> > seems gross to me.  I'd like to at least see some of this validated in
> > the nova API layer before it gets to the scheduler and compute so we
> > can avoid NoValidHost.  My thinking is, in the BFV case where source
> > != volume, if cinder.cross_az_attach is False and instance.az is not
> > None, then we check the list of AZs from the volume API.  If the
> > instance.az is not in that list, we fail fast (400 response to the
> > user).  However, if allow_availability_zone_fallback=True in
> > cinder.conf, we'd be rejecting the request even though the actual
> > volume create would succeed.  These are just details that we don't
> > have in the nova API since it's all policy driven gorp using config
> > options that the user doesn't know about, which makes it really hard
> > to write applications against this - and was part of the reason I moved
to
> deprecate that option.
> >
> > Am I off in the weeds?  It sounds like Duncan is going to try and get
> > a plan together in Tokyo about how to handle this and decouple nova
> > and cinder in this case, which is the right long-term goal.
> >
> 
> Revert is approved: https://review.openstack.org/#/c/227340/
> 

Matt, 

Thanks for reverting the change.

Is there a process description for deprecating features ? It would be good
to include

- notification of operators (in operator's list) and agreed time to reply
- documentation of workaround for those who are using a deprecated feature
in production

Thanks
Tim

> --
> 
> Thanks,
> 
> Matt Riedemann
> 
> 
> __________________________________________________________
> ________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 7349 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150924/c0517dbf/attachment.bin>


More information about the OpenStack-dev mailing list