Thank you. That really helps. 

I am going to diff the nimble.py files between Pike and Queens and see what's changed. 

On Fri, 17 Jan 2020, 22:18 Alan Bishop, <abishop@redhat.com> wrote:


On Fri, Jan 17, 2020 at 2:01 AM Tony Pearce <tony.pearce@cinglevue.com> wrote:
Could anyone help by pointing me where to go to be able to dig into this issue further? 

I have installed a test Openstack environment using RDO Packstack. I wanted to install the same version that I have in Production (Pike) but it's not listed in the CentOS repo via yum search. So I installed Queens. I am using nimble.py Cinder driver. Nimble Storage is a storage array accessed via iscsi from the Openstack host, and is controlled from Openstack by the driver and API. 

What I expected to happen:
1. create an instance with volume (the volume is created on the storage array successfully and instance boots from it)
2. take a snapshot  (snapshot taken on the volume on the array successfully)
3. create a new instance from the snapshot (the api tells the array to clone the snapshot into a new volume on the array and use that volume for the instance)
4. try and delete the snapshot
Expected Result - Openstack gives the user a message like "you're not allowed to do that".

 Note: Step 3 above creates a child volume from the parent snapshot. It's impossible to delete the parent snapshot because IO READ is sent to that part of the original volume (as I understand it).   

My production problem is this: 
1. create an instance with volume (the volume is created on the storage array successfully)
2. take a snapshot  (snapshot taken on the volume on the array successfully)
3. create a new instance from the snapshot (the api tells the array to clone the snapshot into a new volume on the array and use that volume for the instance)
4. try and delete the snapshot
Result - snapshot goes into error state and later, all Cinder operations fail such as new instance/create volume etc. until the correct service is restarted. Then everything works once again. 


To troubleshoot the above, I installed the RDP Packstack Queens (because I couldnt get Pike). I tested the above and now, the result is the snapshot is successfully deleted from openstack but not deleted on the array. The log is below for reference. But I can see the in the log that the array sends back info to openstack saying the snapshot has a clone and the delete cannot be done because of that. Also response code 409. 

Some info about why the problem with Pike started in the first place
1. Vendor is Nimble Storage which HPE purchased
2. HPE/Nimble have dropped support for openstack. Latest supported version is Queens and Nimble array version v4.x. The current Array version is v5.x. Nimble say there are no guarantees with openstack, the driver and the array version v5.x
3. I was previously advised by Nimble that the array version v5.x will work fine and so left our DR array on v5.x with a pending upgrade that had a blocker due to an issue. This issue was resolved in December and the pending upgrade completed to match the DR array took place around 30 days ago. 


With regards to the production issue, I assumed that the array API has some changes between v4.x and v5.x and it's causing an issue with Cinder due to the API response. Although I have not been able to find out if or what changes there are that may have occurred after the array upgrade, as the documentation for this is Nimble internal-only. 


So with that - some questions if I may:
 When Openstack got the 409 error response from the API (as seen in the log below), why would Openstack then proceed to delete the snapshot on the Openstack side? How could I debug this further? I'm not sure what Openstack Cinder is acting on in terns of the response as yet. Maybe Openstack is not specifically looking for the error code in the response? 

The snapshot that got deleted on the openstack side is a problem. Would this be related to the driver? Could it be possible that the driver did not pass the error response to Cinder? 

Hi Tony,

This is exactly what happened, and it appears to be a driver bug introduced in queens by [1]. The code in question [2] logs the error, but fails to propagate the exception. As far as the volume manager is concerned, the snapshot deletion was successful.

[1] https://review.opendev.org/601492
[2] https://opendev.org/openstack/cinder/src/branch/stable/queens/cinder/volume/drivers/nimble.py#L1815

Alan

Thanks in advance. Just for reference, the log snippet is below. 


==> volume.log <==
2020-01-17 16:53:23.718 24723 WARNING py.warnings [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 87e34c89e6fb41d2af25085b64011a55 - default default] /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)
: NimbleAPIException: Failed to execute api snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.
==> api.log <==
2020-01-17 16:53:23.769 25242 INFO cinder.api.openstack.wsgi [req-bfcbff34-134b-497e-82b1-082d48f8767f df7548ecad684f26b8bc802ba63a9814 87e34c89e6fb41d2af25085b64011a55 - default default] http://192.168.53.45:8776/v3/87e34c89e6fb41d2af25085b64011a55/volumes/detail returned with HTTP 200
2020-01-17 16:53:23.770 25242 INFO eventlet.wsgi.server [req-bfcbff34-134b-497e-82b1-082d48f8767f df7548ecad684f26b8bc802ba63a9814 87e34c89e6fb41d2af25085b64011a55 - default default] 192.168.53.45 "GET /v3/87e34c89e6fb41d2af25085b64011a55/volumes/detail HTTP/1.1" status: 200  len: 4657 time: 0.1152730
==> volume.log <==
2020-01-17 16:53:23.811 24723 WARNING py.warnings [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 87e34c89e6fb41d2af25085b64011a55 - default default] /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)
: NimbleAPIException: Failed to execute api snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.
2020-01-17 16:53:23.902 24723 ERROR cinder.volume.drivers.nimble [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 87e34c89e6fb41d2af25085b64011a55 - default default] Re-throwing Exception Failed to execute api snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.: NimbleAPIException: Failed to execute api snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.
2020-01-17 16:53:23.903 24723 WARNING cinder.volume.drivers.nimble [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 87e34c89e6fb41d2af25085b64011a55 - default default] Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 : has a clone: NimbleAPIException: Failed to execute api snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.
2020-01-17 16:53:23.964 24723 WARNING cinder.quota [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 87e34c89e6fb41d2af25085b64011a55 - default default] Deprecated: Default quota for resource: snapshots_Nimble-DR is set by the default quota flag: quota_snapshots_Nimble-DR, it is now deprecated. Please use the default quota class for default quota.
2020-01-17 16:53:24.054 24723 INFO cinder.volume.manager [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 87e34c89e6fb41d2af25085b64011a55 - default default] Delete snapshot completed successfully.


Regards,

Tony Pearce   |  Senior Network Engineer / Infrastructure Lead
Cinglevue International

Email: tony.pearce@cinglevue.com
Web: http://www.cinglevue.com 

Australia 
1 Walsh Loop, Joondalup, WA 6027 Australia.

Direct: +61 8 6202 0036 | Main: +61 8 6202 0024

Note: This email and all attachments are the sole property of Cinglevue International Pty Ltd. (or any of its subsidiary entities), and the information contained herein must be considered confidential, unless specified otherwise.   If you are not the intended recipient, you must not use or forward the information contained in these documents.   If you have received this message in error, please delete the email and notify the sender.