Cinder snapshot delete successful when expected to fail

Alan Bishop abishop at redhat.com
Fri Jan 17 14:17:30 UTC 2020


On Fri, Jan 17, 2020 at 2:01 AM Tony Pearce <tony.pearce at cinglevue.com>
wrote:

> Could anyone help by pointing me where to go to be able to dig into this
> issue further?
>
> I have installed a test Openstack environment using RDO Packstack. I
> wanted to install the same version that I have in Production (Pike) but
> it's not listed in the CentOS repo via yum search. So I installed Queens. I
> am using nimble.py Cinder driver. Nimble Storage is a storage array
> accessed via iscsi from the Openstack host, and is controlled from
> Openstack by the driver and API.
>
> *What I expected to happen:*
> 1. create an instance with volume (the volume is created on the storage
> array successfully and instance boots from it)
> 2. take a snapshot  (snapshot taken on the volume on the array
> successfully)
> 3. create a new instance from the snapshot (the api tells the array to
> clone the snapshot into a new volume on the array and use that volume for
> the instance)
> 4. try and delete the snapshot
> Expected Result - Openstack gives the user a message like "you're not
> allowed to do that".
>
>  Note: Step 3 above creates a child volume from the parent snapshot. It's
> impossible to delete the parent snapshot because IO READ is sent to that
> part of the original volume (as I understand it).
>
> *My production problem is this: *
> 1. create an instance with volume (the volume is created on the storage
> array successfully)
> 2. take a snapshot  (snapshot taken on the volume on the array
> successfully)
> 3. create a new instance from the snapshot (the api tells the array to
> clone the snapshot into a new volume on the array and use that volume for
> the instance)
> 4. try and delete the snapshot
> Result - snapshot goes into error state and later, all Cinder operations
> fail such as new instance/create volume etc. until the correct service is
> restarted. Then everything works once again.
>
>
> To troubleshoot the above, I installed the RDP Packstack Queens (because I
> couldnt get Pike). I tested the above and now, the result is the snapshot
> is successfully deleted from openstack but not deleted on the array. The
> log is below for reference. But I can see the in the log that the array
> sends back info to openstack saying the snapshot has a clone and the delete
> cannot be done because of that. Also response code 409.
>
> *Some info about why the problem with Pike started in the first place*
> 1. Vendor is Nimble Storage which HPE purchased
> 2. HPE/Nimble have dropped support for openstack. Latest supported version
> is Queens and Nimble array version v4.x. The current Array version is v5.x.
> Nimble say there are no guarantees with openstack, the driver and the array
> version v5.x
> 3. I was previously advised by Nimble that the array version v5.x will
> work fine and so left our DR array on v5.x with a pending upgrade that had
> a blocker due to an issue. This issue was resolved in December and the
> pending upgrade completed to match the DR array took place around 30 days
> ago.
>
>
> With regards to the production issue, I assumed that the array API has
> some changes between v4.x and v5.x and it's causing an issue with Cinder
> due to the API response. Although I have not been able to find out if or
> what changes there are that may have occurred after the array upgrade, as
> the documentation for this is Nimble internal-only.
>
>
> *So with that - some questions if I may:*
>  When Openstack got the 409 error response from the API (as seen in the
> log below), why would Openstack then proceed to delete the snapshot on the
> Openstack side? How could I debug this further? I'm not sure what Openstack
> Cinder is acting on in terns of the response as yet. Maybe Openstack is not
> specifically looking for the error code in the response?
>
> The snapshot that got deleted on the openstack side is a problem. Would
> this be related to the driver? Could it be possible that the driver did not
> pass the error response to Cinder?
>

Hi Tony,

This is exactly what happened, and it appears to be a driver bug introduced
in queens by [1]. The code in question [2] logs the error, but fails to
propagate the exception. As far as the volume manager is concerned, the
snapshot deletion was successful.

[1] https://review.opendev.org/601492
[2]
https://opendev.org/openstack/cinder/src/branch/stable/queens/cinder/volume/drivers/nimble.py#L1815

Alan

Thanks in advance. Just for reference, the log snippet is below.
>
>
> ==> volume.log <==
>> 2020-01-17 16:53:23.718 24723 WARNING py.warnings
>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814
>> 87e34c89e6fb41d2af25085b64011a55 - default default]
>> /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:852:
>> InsecureRequestWarning: Unverified HTTPS request is being made. Adding
>> certificate verification is strongly advised. See:
>> https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
>>   InsecureRequestWarning)
>> : NimbleAPIException: Failed to execute api
>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409
>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume
>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.
>> ==> api.log <==
>> 2020-01-17 16:53:23.769 25242 INFO cinder.api.openstack.wsgi
>> [req-bfcbff34-134b-497e-82b1-082d48f8767f df7548ecad684f26b8bc802ba63a9814
>> 87e34c89e6fb41d2af25085b64011a55 - default default]
>> http://192.168.53.45:8776/v3/87e34c89e6fb41d2af25085b64011a55/volumes/detail
>> returned with HTTP 200
>> 2020-01-17 16:53:23.770 25242 INFO eventlet.wsgi.server
>> [req-bfcbff34-134b-497e-82b1-082d48f8767f df7548ecad684f26b8bc802ba63a9814
>> 87e34c89e6fb41d2af25085b64011a55 - default default] 192.168.53.45 "GET
>> /v3/87e34c89e6fb41d2af25085b64011a55/volumes/detail HTTP/1.1" status: 200
>>  len: 4657 time: 0.1152730
>> ==> volume.log <==
>> 2020-01-17 16:53:23.811 24723 WARNING py.warnings
>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814
>> 87e34c89e6fb41d2af25085b64011a55 - default default]
>> /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:852:
>> InsecureRequestWarning: Unverified HTTPS request is being made. Adding
>> certificate verification is strongly advised. See:
>> https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
>>   InsecureRequestWarning)
>> : NimbleAPIException: Failed to execute api
>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409
>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume
>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.
>> 2020-01-17 16:53:23.902 24723 ERROR cinder.volume.drivers.nimble
>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814
>> 87e34c89e6fb41d2af25085b64011a55 - default default] Re-throwing Exception
>> Failed to execute api snapshots/0464a5fd65d75fcfe1000000000000011100001a41:
>> Error Code: 409 Message: Snapshot
>> snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume
>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.:
>> NimbleAPIException: Failed to execute api
>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409
>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume
>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.
>> 2020-01-17 16:53:23.903 24723 WARNING cinder.volume.drivers.nimble
>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814
>> 87e34c89e6fb41d2af25085b64011a55 - default default] Snapshot
>> snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 : has a clone:
>> NimbleAPIException: Failed to execute api
>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409
>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume
>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.
>> 2020-01-17 16:53:23.964 24723 WARNING cinder.quota
>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814
>> 87e34c89e6fb41d2af25085b64011a55 - default default] Deprecated: Default
>> quota for resource: snapshots_Nimble-DR is set by the default quota flag:
>> quota_snapshots_Nimble-DR, it is now deprecated. Please use the default
>> quota class for default quota.
>> 2020-01-17 16:53:24.054 24723 INFO cinder.volume.manager
>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814
>> 87e34c89e6fb41d2af25085b64011a55 - default default] Delete snapshot
>> completed successfully.
>
>
>
> Regards,
>
> *Tony Pearce*   |
> *Senior Network Engineer / Infrastructure Lead**Cinglevue International
> <https://www.cinglevue.com>*
>
> Email: tony.pearce at cinglevue.com
> Web: http://www.cinglevue.com
>
> *Australia*
> 1 Walsh Loop, Joondalup, WA 6027 Australia.
>
> Direct: +61 8 6202 0036 | Main: +61 8 6202 0024
>
> Note: This email and all attachments are the sole property of Cinglevue
> International Pty Ltd. (or any of its subsidiary entities), and the
> information contained herein must be considered confidential, unless
> specified otherwise.   If you are not the intended recipient, you must not
> use or forward the information contained in these documents.   If you have
> received this message in error, please delete the email and notify the
> sender.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200117/ee5c00e9/attachment.html>


More information about the openstack-discuss mailing list