Re: [kolla][nova][cinder] Got Gateway-Timeout error on VM evacuation if it has volume attached.

25 Jul 2019

      On 7/25/2019 3:14 AM, Gorka Eguileor wrote:
...
Attachment delete is a synchronous operation, so all the different
connection timeouts may affect the operation: Nova to HAProxy, HAProxy
to Cinder-API, Cinder-API to Cinder-Volume via RabbitMQ, Cinder-Volume
to Storage backend.
I would recommend you looking at the specific attachment_delete request
that failed in Cinder logs and see how long it took to complete, and
then check how long it took for the 504 error to happen.  With that info
you can get an idea of how much higher your timeout must be.
It could also happen that the Cinder-API raises a timeout error when
calling the Cinder-Volume.  In this case you should check the
cinder-volume service to see how long it took it to complete, as the
operation continues.
Internally the Cinder-API to Cinder-Volume timeout is usually around 60
seconds (rpc_response_timeout).
Yeah this is a known intermittent issue in our CI jobs as well, for example:

http://status.openstack.org/elastic-recheck/#1763712

As I mentioned in the bug report for that issue:

https://bugs.launchpad.net/cinder/+bug/1763712

It might be worth using the long_rpc_timeout approach for this assuming 
the http response doesn't timeout. Nova uses long_rpc_timeout for known 
long RPC calls:

https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.lon...

Cinder should probably do the same for initialize connection style RPC 
calls. I've seen other gate failures where cinder-backup to 
cinder-volume rpc calls to initialize a connection have timed out as 
well, e.g.:

https://bugs.launchpad.net/cinder/+bug/1739482

-- 

Thanks,

Matt

Re: [kolla][nova][cinder] Got Gateway-Timeout error on VM evacuation if it has volume attached.

Matt Riedemann