On 7/25/2019 3:14 AM, Gorka Eguileor wrote:
> Attachment delete is a synchronous operation, so all the different
> connection timeouts may affect the operation: Nova to HAProxy, HAProxy
> to Cinder-API, Cinder-API to Cinder-Volume via RabbitMQ, Cinder-Volume
> to Storage backend.
>
> I would recommend you looking at the specific attachment_delete request
> that failed in Cinder logs and see how long it took to complete, and
> then check how long it took for the 504 error to happen. With that info
> you can get an idea of how much higher your timeout must be.
>
> It could also happen that the Cinder-API raises a timeout error when
> calling the Cinder-Volume. In this case you should check the
> cinder-volume service to see how long it took it to complete, as the
> operation continues.
>
> Internally the Cinder-API to Cinder-Volume timeout is usually around 60
> seconds (rpc_response_timeout).
Yeah this is a known intermittent issue in our CI jobs as well, for example:
http://status.openstack.org/elastic-recheck/#1763712
As I mentioned in the bug report for that issue:
https://bugs.launchpad.net/cinder/+bug/1763712
It might be worth using the long_rpc_timeout approach for this assuming
the http response doesn't timeout. Nova uses long_rpc_timeout for known
long RPC calls:
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.long_rpc_timeout
Cinder should probably do the same for initialize connection style RPC
calls. I've seen other gate failures where cinder-backup to
cinder-volume rpc calls to initialize a connection have timed out as
well, e.g.:
https://bugs.launchpad.net/cinder/+bug/1739482
--
Thanks,
Matt