Cinder snapshot delete successful when expected to fail

Tony Pearce tony.pearce at cinglevue.com
Sat Jan 18 01:44:16 UTC 2020


Thank you. That really helps.

I am going to diff the nimble.py files between Pike and Queens and see
what's changed.

On Fri, 17 Jan 2020, 22:18 Alan Bishop, <abishop at redhat.com> wrote:

>
>
> On Fri, Jan 17, 2020 at 2:01 AM Tony Pearce <tony.pearce at cinglevue.com>
> wrote:
>
>> Could anyone help by pointing me where to go to be able to dig into this
>> issue further?
>>
>> I have installed a test Openstack environment using RDO Packstack. I
>> wanted to install the same version that I have in Production (Pike) but
>> it's not listed in the CentOS repo via yum search. So I installed Queens. I
>> am using nimble.py Cinder driver. Nimble Storage is a storage array
>> accessed via iscsi from the Openstack host, and is controlled from
>> Openstack by the driver and API.
>>
>> *What I expected to happen:*
>> 1. create an instance with volume (the volume is created on the storage
>> array successfully and instance boots from it)
>> 2. take a snapshot  (snapshot taken on the volume on the array
>> successfully)
>> 3. create a new instance from the snapshot (the api tells the array to
>> clone the snapshot into a new volume on the array and use that volume for
>> the instance)
>> 4. try and delete the snapshot
>> Expected Result - Openstack gives the user a message like "you're not
>> allowed to do that".
>>
>>  Note: Step 3 above creates a child volume from the parent snapshot. It's
>> impossible to delete the parent snapshot because IO READ is sent to that
>> part of the original volume (as I understand it).
>>
>> *My production problem is this: *
>> 1. create an instance with volume (the volume is created on the storage
>> array successfully)
>> 2. take a snapshot  (snapshot taken on the volume on the array
>> successfully)
>> 3. create a new instance from the snapshot (the api tells the array to
>> clone the snapshot into a new volume on the array and use that volume for
>> the instance)
>> 4. try and delete the snapshot
>> Result - snapshot goes into error state and later, all Cinder operations
>> fail such as new instance/create volume etc. until the correct service is
>> restarted. Then everything works once again.
>>
>>
>> To troubleshoot the above, I installed the RDP Packstack Queens (because
>> I couldnt get Pike). I tested the above and now, the result is the snapshot
>> is successfully deleted from openstack but not deleted on the array. The
>> log is below for reference. But I can see the in the log that the array
>> sends back info to openstack saying the snapshot has a clone and the delete
>> cannot be done because of that. Also response code 409.
>>
>> *Some info about why the problem with Pike started in the first place*
>> 1. Vendor is Nimble Storage which HPE purchased
>> 2. HPE/Nimble have dropped support for openstack. Latest supported
>> version is Queens and Nimble array version v4.x. The current Array version
>> is v5.x. Nimble say there are no guarantees with openstack, the driver and
>> the array version v5.x
>> 3. I was previously advised by Nimble that the array version v5.x will
>> work fine and so left our DR array on v5.x with a pending upgrade that had
>> a blocker due to an issue. This issue was resolved in December and the
>> pending upgrade completed to match the DR array took place around 30 days
>> ago.
>>
>>
>> With regards to the production issue, I assumed that the array API has
>> some changes between v4.x and v5.x and it's causing an issue with Cinder
>> due to the API response. Although I have not been able to find out if or
>> what changes there are that may have occurred after the array upgrade, as
>> the documentation for this is Nimble internal-only.
>>
>>
>> *So with that - some questions if I may:*
>>  When Openstack got the 409 error response from the API (as seen in the
>> log below), why would Openstack then proceed to delete the snapshot on the
>> Openstack side? How could I debug this further? I'm not sure what Openstack
>> Cinder is acting on in terns of the response as yet. Maybe Openstack is not
>> specifically looking for the error code in the response?
>>
>> The snapshot that got deleted on the openstack side is a problem. Would
>> this be related to the driver? Could it be possible that the driver did not
>> pass the error response to Cinder?
>>
>
> Hi Tony,
>
> This is exactly what happened, and it appears to be a driver bug
> introduced in queens by [1]. The code in question [2] logs the error, but
> fails to propagate the exception. As far as the volume manager is
> concerned, the snapshot deletion was successful.
>
> [1] https://review.opendev.org/601492
> [2]
> https://opendev.org/openstack/cinder/src/branch/stable/queens/cinder/volume/drivers/nimble.py#L1815
>
> Alan
>
> Thanks in advance. Just for reference, the log snippet is below.
>>
>>
>> ==> volume.log <==
>>> 2020-01-17 16:53:23.718 24723 WARNING py.warnings
>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814
>>> 87e34c89e6fb41d2af25085b64011a55 - default default]
>>> /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:852:
>>> InsecureRequestWarning: Unverified HTTPS request is being made. Adding
>>> certificate verification is strongly advised. See:
>>> https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
>>>   InsecureRequestWarning)
>>> : NimbleAPIException: Failed to execute api
>>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409
>>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume
>>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.
>>> ==> api.log <==
>>> 2020-01-17 16:53:23.769 25242 INFO cinder.api.openstack.wsgi
>>> [req-bfcbff34-134b-497e-82b1-082d48f8767f df7548ecad684f26b8bc802ba63a9814
>>> 87e34c89e6fb41d2af25085b64011a55 - default default]
>>> http://192.168.53.45:8776/v3/87e34c89e6fb41d2af25085b64011a55/volumes/detail
>>> returned with HTTP 200
>>> 2020-01-17 16:53:23.770 25242 INFO eventlet.wsgi.server
>>> [req-bfcbff34-134b-497e-82b1-082d48f8767f df7548ecad684f26b8bc802ba63a9814
>>> 87e34c89e6fb41d2af25085b64011a55 - default default] 192.168.53.45 "GET
>>> /v3/87e34c89e6fb41d2af25085b64011a55/volumes/detail HTTP/1.1" status: 200
>>>  len: 4657 time: 0.1152730
>>> ==> volume.log <==
>>> 2020-01-17 16:53:23.811 24723 WARNING py.warnings
>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814
>>> 87e34c89e6fb41d2af25085b64011a55 - default default]
>>> /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:852:
>>> InsecureRequestWarning: Unverified HTTPS request is being made. Adding
>>> certificate verification is strongly advised. See:
>>> https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
>>>   InsecureRequestWarning)
>>> : NimbleAPIException: Failed to execute api
>>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409
>>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume
>>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.
>>> 2020-01-17 16:53:23.902 24723 ERROR cinder.volume.drivers.nimble
>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814
>>> 87e34c89e6fb41d2af25085b64011a55 - default default] Re-throwing Exception
>>> Failed to execute api snapshots/0464a5fd65d75fcfe1000000000000011100001a41:
>>> Error Code: 409 Message: Snapshot
>>> snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume
>>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.:
>>> NimbleAPIException: Failed to execute api
>>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409
>>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume
>>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.
>>> 2020-01-17 16:53:23.903 24723 WARNING cinder.volume.drivers.nimble
>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814
>>> 87e34c89e6fb41d2af25085b64011a55 - default default] Snapshot
>>> snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 : has a clone:
>>> NimbleAPIException: Failed to execute api
>>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409
>>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume
>>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.
>>> 2020-01-17 16:53:23.964 24723 WARNING cinder.quota
>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814
>>> 87e34c89e6fb41d2af25085b64011a55 - default default] Deprecated: Default
>>> quota for resource: snapshots_Nimble-DR is set by the default quota flag:
>>> quota_snapshots_Nimble-DR, it is now deprecated. Please use the default
>>> quota class for default quota.
>>> 2020-01-17 16:53:24.054 24723 INFO cinder.volume.manager
>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814
>>> 87e34c89e6fb41d2af25085b64011a55 - default default] Delete snapshot
>>> completed successfully.
>>
>>
>>
>> Regards,
>>
>> *Tony Pearce*   |
>> *Senior Network Engineer / Infrastructure Lead**Cinglevue International
>> <https://www.cinglevue.com>*
>>
>> Email: tony.pearce at cinglevue.com
>> Web: http://www.cinglevue.com
>>
>> *Australia*
>> 1 Walsh Loop, Joondalup, WA 6027 Australia.
>>
>> Direct: +61 8 6202 0036 | Main: +61 8 6202 0024
>>
>> Note: This email and all attachments are the sole property of Cinglevue
>> International Pty Ltd. (or any of its subsidiary entities), and the
>> information contained herein must be considered confidential, unless
>> specified otherwise.   If you are not the intended recipient, you must not
>> use or forward the information contained in these documents.   If you have
>> received this message in error, please delete the email and notify the
>> sender.
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200118/19359778/attachment-0001.html>


More information about the openstack-discuss mailing list