[openstack-dev] [nova][cinder] what are the key errors with volume detach

Andrea Rosa andrea.rosa at hpe.com
Mon Dec 14 17:24:13 UTC 2015



On 10/12/15 15:29, Matt Riedemann wrote:

>> In a simplified view of a detach volume we can say that the nova code
>> does:
>> 1 detach the volume from the instance
>> 2 Inform cinder about the detach and call the terminate_connection on
>> the cinder API.
>> 3 delete the dbm recod in the nova DB
> 
> We actually:
> 
> 1. terminate the connection in cinder:
> 
> https://github.com/openstack/nova/blob/c4ca1abb4a49bf0bce765acd3ce906bd117ce9b7/nova/compute/manager.py#L2312
> 
> 
> 2. detach the volume
> 
> https://github.com/openstack/nova/blob/c4ca1abb4a49bf0bce765acd3ce906bd117ce9b7/nova/compute/manager.py#L2315
> 
> 
> 3. delete the volume (if marked for delete_on_termination):
> 
> https://github.com/openstack/nova/blob/c4ca1abb4a49bf0bce765acd3ce906bd117ce9b7/nova/compute/manager.py#L2348
> 
> 
> 4. delete the bdm in the nova db:
> 
> https://github.com/openstack/nova/blob/c4ca1abb4a49bf0bce765acd3ce906bd117ce9b7/nova/compute/manager.py#L908
> 
> 

I am confused here, why are are you referring to the _shutdown_instance
code?


> So if terminate_connection fails, we shouldn't get to detach. And if
> detach fails, we shouldn't get to delete.
> 
>>
>> If 2 fails the volumes get stuck in a detaching status and any further
>> attempt to delete or detach the volume will fail:
>> "Delete for volume <volume_id> failed: Volume <volume_id> is still
>> attached, detach volume first. (HTTP 400)"
>>
>> And if you try to detach:
>> "EROR (BadRequest): Invalid input received: Invalid volume: Unable to
>> detach volume. Volume status must be 'in-use' and attach_status must
>> be 'attached' to detach. Currently: status: 'detaching',
>> attach_status: 'attached.' (HTTP 400)"
>>
>> at the moment the only way to clean up the situation is to hack the
>> nova DB for deleting the bdm record and do some hack on the cinder
>> side as well.
>> We wanted a way to clean up the situation avoiding the manual hack to
>> the nova DB.
> 
> Can't cinder rollback state somehow if it's bogus or failed an
> operation? For example, if detach failed, shouldn't we not be in
> 'detaching' state? This is like auto-reverting task_state on server
> instances when an operation fails so that we can reset or delete those
> servers if needed.

I think that is an option but probably it is part of the redesign of the
cinder API (see the solution proposed #3), but It would be nice to get
cinder guys commenting here.

>> Solution proposed #3
>> Ok, so the solution is to fix the Cinder API and makes the interaction
>> between Nova volume manager and that API robust.
>> This time I was right (YAY) but as you can imagine this fix is not
>> going to be an easy one and after talking with Cinder guys they
>> clearly told me that thatt is going to be a massive change in the
>> Cinder API and it is unlikely to land in the N(utella) or O(melette) 
>> release.

> As Sean pointed out in another reply, I feel like what we're really
> missing here is some rollback code in the case that delete fails so we
> don't get in this stuck state and have to rely on deleting the BDMs
> manually in the database just to delete the instance.
> 
> We should rollback on delete fail 1 so that delete request 2 can pass
> the 'check attach' checks again.

The communication with cinder is async, Nova doesn't wait or check if
the detach on cinder side has been executed correctly.

Thanks
--
Andrea Rosa



More information about the OpenStack-dev mailing list