On 7/25/2019 3:14 AM, Gorka Eguileor wrote:
Attachment delete is a synchronous operation, so all the different connection timeouts may affect the operation: Nova to HAProxy, HAProxy to Cinder-API, Cinder-API to Cinder-Volume via RabbitMQ, Cinder-Volume to Storage backend.
I would recommend you looking at the specific attachment_delete request that failed in Cinder logs and see how long it took to complete, and then check how long it took for the 504 error to happen. With that info you can get an idea of how much higher your timeout must be.
It could also happen that the Cinder-API raises a timeout error when calling the Cinder-Volume. In this case you should check the cinder-volume service to see how long it took it to complete, as the operation continues.
Internally the Cinder-API to Cinder-Volume timeout is usually around 60 seconds (rpc_response_timeout).
Yeah this is a known intermittent issue in our CI jobs as well, for example: http://status.openstack.org/elastic-recheck/#1763712 As I mentioned in the bug report for that issue: https://bugs.launchpad.net/cinder/+bug/1763712 It might be worth using the long_rpc_timeout approach for this assuming the http response doesn't timeout. Nova uses long_rpc_timeout for known long RPC calls: https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.lon... Cinder should probably do the same for initialize connection style RPC calls. I've seen other gate failures where cinder-backup to cinder-volume rpc calls to initialize a connection have timed out as well, e.g.: https://bugs.launchpad.net/cinder/+bug/1739482 -- Thanks, Matt