[openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

Matt Riedemann mriedem at linux.vnet.ibm.com
Thu Sep 24 14:59:07 UTC 2015



On 9/24/2015 9:06 AM, Matt Riedemann wrote:
>
>
> On 9/24/2015 3:19 AM, Sylvain Bauza wrote:
>>
>>
>> Le 24/09/2015 09:04, Duncan Thomas a écrit :
>>> Hi
>>>
>>> I thought I was late on this thread, but looking at the time stamps,
>>> it is just something that escalated very quickly. I am honestly
>>> surprised an cross-project interaction option went from 'we don't seem
>>> to understand this' to 'deprecation merged' in 4 hours, with only a 12
>>> hour discussion on the mailing list, right at the end of a cycle when
>>> we're supposed to be stabilising features.
>>>
>>
>> So, I agree it was maybe a bit too quick hence the revert. That said,
>> Nova master is now Mitaka, which means that the deprecation change was
>> provided for the next cycle, not the one currently stabilising.
>>
>> Anyway, I'm really all up with discussing why Cinder needs to know the
>> Nova AZs.
>>
>>> I proposed a session at the Tokyo summit for a discussion of Cinder
>>> AZs, since there was clear confusion about what they are intended for
>>> and how they should be configured.
>>
>> Cool, count me in from the Nova standpoint.
>>
>>> Since then I've reached out to and gotten good feedback from, a number
>>> of operators. There are two distinct configurations for AZ behaviour
>>> in cinder, and both sort-of worked until very recently.
>>>
>>> 1) No AZs in cinder
>>> This is the config where a single 'blob' of storage (most of the
>>> operators who responded so far are using Ceph, though that isn't
>>> required). The storage takes care of availability concerns, and any AZ
>>> info from nova should just be ignored.
>>>
>>> 2) Cinder AZs map to Nova AZs
>>> In this case, some combination of storage / networking / etc couples
>>> storage to nova AZs. It is may be that an AZ is used as a unit of
>>> scaling, or it could be a real storage failure domain. Eitehr way,
>>> there are a number of operators who have this configuration and want
>>> to keep it. Storage can certainly have a failure domain, and limiting
>>> the scalability problem of storage to a single cmpute AZ can have
>>> definite advantages in failure scenarios. These people do not want
>>> cross-az attach.
>>>
>>
>> Ahem, Nova AZs are not failure domains - I mean the current
>> implementation, in the sense of many people understand what is a failure
>> domain, ie. a physical unit of machines (a bay, a room, a floor, a
>> datacenter).
>> All the AZs in Nova share the same controlplane with the same message
>> queue and database, which means that one failure can be propagated to
>> the other AZ.
>>
>> To be honest, there is one very specific usecase where AZs *are* failure
>> domains : when cells exact match with AZs (ie. one AZ grouping all the
>> hosts behind one cell). That's the very specific usecase that Sam is
>> mentioning in his email, and I certainly understand we need to keep that.
>>
>> What are AZs in Nova is pretty well explained in a quite old blogpost :
>> http://blog.russellbryant.net/2013/05/21/availability-zones-and-host-aggregates-in-openstack-compute-nova/
>>
>>
>> We also added a few comments in our developer doc here
>> http://docs.openstack.org/developer/nova/aggregates.html#availability-zones-azs
>>
>>
>> tl;dr: AZs are aggregate metadata that makes those aggregates of compute
>> nodes visible to the users. Nothing more than that, no magic sauce.
>> That's just a logical abstraction that can be mapping your physical
>> deployment, but like I said, which would share the same bus and DB.
>> Of course, you could still provide networks distinct between AZs but
>> that just gives you the L2 isolation, not the real failure domain in a
>> Business Continuity Plan way.
>>
>> What puzzles me is how Cinder is managing a datacenter-level of
>> isolation given there is no cells concept AFAIK. I assume that
>> cinder-volumes are belonging to a specific datacenter but how is managed
>> the controlplane of it ? I can certainly understand the need of affinity
>> placement between physical units, but I'm missing that piece, and
>> consequently I wonder why Nova need to provide AZs to Cinder on a
>> general case.
>>
>>
>>
>>> My hope at the summit session was to agree these two configurations,
>>> discuss any scenarios not covered by these two configuration, and nail
>>> down the changes we need to get these to work properly. There's
>>> definitely been interest and activity in the operator community in
>>> making nova and cinder AZs interact, and every desired interaction
>>> I've gotten details about so far matches one of the above models.
>>>
>>
>> I'm all with you about providing a way for users to get volume affinity
>> for Nova. That's a long story I'm trying to consider and we are
>> constantly trying to improve the nova scheduler interfaces so that other
>> projects could provide resources to the nova scheduler for decision
>> making. I just want to consider whether AZs are the best concept for
>> that or we should do thing by other ways (again, because AZs are not
>> what people expect).
>>
>> Again, count me in for the Cinder session, and just lemme know when the
>> session is planned so I could attend it.
>>
>> -Sylvain
>>
>>
>>>
>>> __________________________________________________________________________
>>>
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> __________________________________________________________________________
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> I plan on reverting the deprecation change (which was a mitaka change,
> not a liberty change, as Sylvain pointed out).
>
> However, given how many nova and cinder cores were talking about this
> yesterday and thought it was the right thing to do speaks to the fact
> that this is not a well understood use case (or documented at all).  So
> as part of reverting the deprecation I also want to see improved docs
> for the cross_az_attach option itself and probably a nova devref change
> explaining the use cases and issues with this.
>
> I think the volume attach case is pretty straightforward.  You create a
> nova instance in some nova AZ x and create a cinder volume in some
> cinder AZ y and try to attach the volume to the server instance.  If
> cinder.cross_az_attach=True this is OK, else it fails.
>
> The problem I have is with the boot from volume case where
> source=(blank/image/snapshot).  In those cases nova is creating the
> volume and passing the server instance AZ to the volume create API.  How
> are people that are using cinder.cross_az_attach=False handling the BFV
> case?
>
> Per bug 1496235 that started this, the user is booting a nova instance
> in a nova AZ with bdm source=image and when nova tries to create the
> volume it fails because that AZ doesn't exist in cinder.  This fails in
> the compute manager when building the instance, so this results in a
> NoValidHost error for the user - which we all know and love as a super
> useful error.  So how do we handle this case?  If
> cinder.cross_az_attach=True in nova we could just not pass the instance
> AZ to the volume create, or only pass it if cinder has that AZ available.
>
> But if cinder.cross_az_attach=False when creating the volume, what do we
> do?  I guess we can just leave the code as-is and if the AZ isn't in
> cinder (or your admin hasn't set allow_availability_zone_fallback=True
> in cinder.conf), then it fails and you open a support ticket.  That
> seems gross to me.  I'd like to at least see some of this validated in
> the nova API layer before it gets to the scheduler and compute so we can
> avoid NoValidHost.  My thinking is, in the BFV case where source !=
> volume, if cinder.cross_az_attach is False and instance.az is not None,
> then we check the list of AZs from the volume API.  If the instance.az
> is not in that list, we fail fast (400 response to the user).  However,
> if allow_availability_zone_fallback=True in cinder.conf, we'd be
> rejecting the request even though the actual volume create would
> succeed.  These are just details that we don't have in the nova API
> since it's all policy driven gorp using config options that the user
> doesn't know about, which makes it really hard to write applications
> against this - and was part of the reason I moved to deprecate that option.
>
> Am I off in the weeds?  It sounds like Duncan is going to try and get a
> plan together in Tokyo about how to handle this and decouple nova and
> cinder in this case, which is the right long-term goal.
>

Revert is approved: https://review.openstack.org/#/c/227340/

-- 

Thanks,

Matt Riedemann




More information about the OpenStack-dev mailing list