Roger that, thanks for explanation.

I think there's another reason to me that get this issue.
The environment is stayed without any internet nor local NTP server, until the last test.
Before the test, the nova and cinder services became unstable because they keeping up and down. And I found that the clock are out of sync between nodes.
We let one of the node can connect outside and let NTP client pointed to that one on other nodes. Then problem solved.
Of course the test is successful.

I'm not sure but that's a one of reason right?

But I think I still need to try optimize the timeout value since the API response is slow when shutting down a node.
Wonder know why it become slow when a node down.

I'll try to gain up rpc_response_timeout in Cinder and do more testing.

Matt Riedemann <mriedemos@gmail.com> 於 2019年7月26日 週五 下午9:42寫道:
On 7/25/2019 11:54 PM, Eddie Yen wrote:
> And I think I should gain rpc_response_timeout rather than
> long_rpc_timeout in nova.

Since Cinder doesn't have the long_rpc_timeout option like Nova you're
only option is to bump up the rpc_response_timeout in Cinder but that
will be used by all RPC calls in Cinder, not just the
initialize/terminate connection calls for attachments. Maybe that's not
a problem, but long_rpc_timeout in Nova allows us to pick which RPC
calls to use that on rather than everywhere. The changes to Cinder
shouldn't be that hard if they follow the Nova patch [1].

[1] https://review.opendev.org/#/c/566696/

--

Thanks,

Matt