[openstack-dev] [nova][cinder] volumes stuck detaching attaching and force detach

D'Angelo, Scott scott.dangelo at hpe.com
Tue Mar 1 13:21:05 UTC 2016


Matt, changing Nova to store the connector info at volume attach time does help. Where the gap will remain is after Nova evacuation or live migration, when that info will need to be updated in Cinder. We need to change the Cinder API to have some mechanism to allow this.
We'd also like Cinder to store the appropriate info to allow a force-detach for the cases where Nova cannot make the call to Cinder.
Ongoing work for this and related issues is tracked and discussed here:
https://etherpad.openstack.org/p/cinder-nova-api-changes

Scott D'Angelo (scottda)
________________________________________
From: Matt Riedemann [mriedem at linux.vnet.ibm.com]
Sent: Monday, February 29, 2016 7:48 AM
To: openstack-dev at lists.openstack.org
Subject: Re: [openstack-dev] [nova][cinder] volumes stuck detaching attaching and force detach

On 2/22/2016 4:08 PM, Walter A. Boring IV wrote:
> On 02/22/2016 11:24 AM, John Garbutt wrote:
>> Hi,
>>
>> Just came up on IRC, when nova-compute gets killed half way through a
>> volume attach (i.e. no graceful shutdown), things get stuck in a bad
>> state, like volumes stuck in the attaching state.
>>
>> This looks like a new addition to this conversation:
>> http://lists.openstack.org/pipermail/openstack-dev/2015-December/082683.html
>>
>> And brings us back to this discussion:
>> https://blueprints.launchpad.net/nova/+spec/add-force-detach-to-nova
>>
>> What if we move our attention towards automatically recovering from
>> the above issue? I am wondering if we can look at making our usually
>> recovery code deal with the above situation:
>> https://github.com/openstack/nova/blob/834b5a9e3a4f8c6ee2e3387845fc24c79f4bf615/nova/compute/manager.py#L934
>>
>>
>> Did we get the Cinder APIs in place that enable the force-detach? I
>> think we did and it was this one?
>> https://blueprints.launchpad.net/python-cinderclient/+spec/nova-force-detach-needs-cinderclient-api
>>
>>
>> I think diablo_rojo might be able to help dig for any bugs we have
>> related to this. I just wanted to get this idea out there before I
>> head out.
>>
>> Thanks,
>> John
>>
>> __________________________________________________________________________
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> .
>>
> The problem is a little more complicated.
>
> In order for cinder backends to be able to do a force detach correctly,
> the Cinder driver needs to have the correct 'connector' dictionary
> passed in to terminate_connection.  That connector dictionary is the
> collection of initiator side information which is gleaned here:
> https://github.com/openstack/os-brick/blob/master/os_brick/initiator/connector.py#L99-L144
>
>
> The plan was to save that connector information in the Cinder
> volume_attachment table.  When a force detach is called, Cinder has the
> existing connector saved if Nova doesn't have it.  The problem was live
> migration.  When you migrate to the destination n-cpu host, the
> connector that Cinder had is now out of date.  There is no API in Cinder
> today to allow updating an existing attachment.
>
> So, the plan at the Mitaka summit was to add this new API, but it
> required microversions to land, which we still don't have in Cinder's
> API today.
>
>
> Walt
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

Regarding storing off the initial connector information from the attach,
does this [1] help bridge the gap? That adds the connector dict to the
connection_info dict that is serialized and stored in the nova
block_device_mappings table, and then in that patch is used to pass it
to terminate_connection in the case that the host has changed.

[1] https://review.openstack.org/#/c/266095/

--

Thanks,

Matt Riedemann


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list