[openstack-dev] [nova][cinder] what are the key errors with volume detach
andrea.rosa at hpe.com
Mon Dec 14 17:24:13 UTC 2015
On 10/12/15 15:29, Matt Riedemann wrote:
>> In a simplified view of a detach volume we can say that the nova code
>> 1 detach the volume from the instance
>> 2 Inform cinder about the detach and call the terminate_connection on
>> the cinder API.
>> 3 delete the dbm recod in the nova DB
> We actually:
> 1. terminate the connection in cinder:
> 2. detach the volume
> 3. delete the volume (if marked for delete_on_termination):
> 4. delete the bdm in the nova db:
I am confused here, why are are you referring to the _shutdown_instance
> So if terminate_connection fails, we shouldn't get to detach. And if
> detach fails, we shouldn't get to delete.
>> If 2 fails the volumes get stuck in a detaching status and any further
>> attempt to delete or detach the volume will fail:
>> "Delete for volume <volume_id> failed: Volume <volume_id> is still
>> attached, detach volume first. (HTTP 400)"
>> And if you try to detach:
>> "EROR (BadRequest): Invalid input received: Invalid volume: Unable to
>> detach volume. Volume status must be 'in-use' and attach_status must
>> be 'attached' to detach. Currently: status: 'detaching',
>> attach_status: 'attached.' (HTTP 400)"
>> at the moment the only way to clean up the situation is to hack the
>> nova DB for deleting the bdm record and do some hack on the cinder
>> side as well.
>> We wanted a way to clean up the situation avoiding the manual hack to
>> the nova DB.
> Can't cinder rollback state somehow if it's bogus or failed an
> operation? For example, if detach failed, shouldn't we not be in
> 'detaching' state? This is like auto-reverting task_state on server
> instances when an operation fails so that we can reset or delete those
> servers if needed.
I think that is an option but probably it is part of the redesign of the
cinder API (see the solution proposed #3), but It would be nice to get
cinder guys commenting here.
>> Solution proposed #3
>> Ok, so the solution is to fix the Cinder API and makes the interaction
>> between Nova volume manager and that API robust.
>> This time I was right (YAY) but as you can imagine this fix is not
>> going to be an easy one and after talking with Cinder guys they
>> clearly told me that thatt is going to be a massive change in the
>> Cinder API and it is unlikely to land in the N(utella) or O(melette)
> As Sean pointed out in another reply, I feel like what we're really
> missing here is some rollback code in the case that delete fails so we
> don't get in this stuck state and have to rely on deleting the BDMs
> manually in the database just to delete the instance.
> We should rollback on delete fail 1 so that delete request 2 can pass
> the 'check attach' checks again.
The communication with cinder is async, Nova doesn't wait or check if
the detach on cinder side has been executed correctly.
More information about the OpenStack-dev