[kolla][nova][cinder] Got Gateway-Timeout error on VM evacuation if it has volume attached.

Matt Riedemann mriedemos at gmail.com
Thu Jul 25 12:02:24 UTC 2019


On 7/25/2019 3:14 AM, Gorka Eguileor wrote:
> Attachment delete is a synchronous operation, so all the different
> connection timeouts may affect the operation: Nova to HAProxy, HAProxy
> to Cinder-API, Cinder-API to Cinder-Volume via RabbitMQ, Cinder-Volume
> to Storage backend.
> 
> I would recommend you looking at the specific attachment_delete request
> that failed in Cinder logs and see how long it took to complete, and
> then check how long it took for the 504 error to happen.  With that info
> you can get an idea of how much higher your timeout must be.
> 
> It could also happen that the Cinder-API raises a timeout error when
> calling the Cinder-Volume.  In this case you should check the
> cinder-volume service to see how long it took it to complete, as the
> operation continues.
> 
> Internally the Cinder-API to Cinder-Volume timeout is usually around 60
> seconds (rpc_response_timeout).

Yeah this is a known intermittent issue in our CI jobs as well, for example:

http://status.openstack.org/elastic-recheck/#1763712

As I mentioned in the bug report for that issue:

https://bugs.launchpad.net/cinder/+bug/1763712

It might be worth using the long_rpc_timeout approach for this assuming 
the http response doesn't timeout. Nova uses long_rpc_timeout for known 
long RPC calls:

https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.long_rpc_timeout

Cinder should probably do the same for initialize connection style RPC 
calls. I've seen other gate failures where cinder-backup to 
cinder-volume rpc calls to initialize a connection have timed out as 
well, e.g.:

https://bugs.launchpad.net/cinder/+bug/1739482

-- 

Thanks,

Matt



More information about the openstack-discuss mailing list