[kolla][nova][cinder] Got Gateway-Timeout error on VM evacuation if it has volume attached.

Matt Riedemann mriedemos at gmail.com
Thu Jul 25 12:02:24 UTC 2019

On 7/25/2019 3:14 AM, Gorka Eguileor wrote:
> Attachment delete is a synchronous operation, so all the different
> connection timeouts may affect the operation: Nova to HAProxy, HAProxy
> to Cinder-API, Cinder-API to Cinder-Volume via RabbitMQ, Cinder-Volume
> to Storage backend.
> I would recommend you looking at the specific attachment_delete request
> that failed in Cinder logs and see how long it took to complete, and
> then check how long it took for the 504 error to happen.  With that info
> you can get an idea of how much higher your timeout must be.
> It could also happen that the Cinder-API raises a timeout error when
> calling the Cinder-Volume.  In this case you should check the
> cinder-volume service to see how long it took it to complete, as the
> operation continues.
> Internally the Cinder-API to Cinder-Volume timeout is usually around 60
> seconds (rpc_response_timeout).

Yeah this is a known intermittent issue in our CI jobs as well, for example:


As I mentioned in the bug report for that issue:


It might be worth using the long_rpc_timeout approach for this assuming 
the http response doesn't timeout. Nova uses long_rpc_timeout for known 
long RPC calls:


Cinder should probably do the same for initialize connection style RPC 
calls. I've seen other gate failures where cinder-backup to 
cinder-volume rpc calls to initialize a connection have timed out as 
well, e.g.:





