Re: [kolla][nova][cinder] Got Gateway-Timeout error on VM evacuation if it has volume attached.

25 Jul 2019

      On 23/07, Eddie Yen wrote:
...
Hi Matt, thanks for your reply first.
The log I paste is from nova-compute.
And I also check cinder-api & cinder-volume logs according from timestamp.
Strange is, no error messages found during that time.
Hi,

It could make sense that you see no errors in Cinder.  The error from
your pastebin is not coming from Cinder, it is coming from your HAProxy
(or whatever load balancer you have in front of the Cinder-API nodes).

Attachment delete is a synchronous operation, so all the different
connection timeouts may affect the operation: Nova to HAProxy, HAProxy
to Cinder-API, Cinder-API to Cinder-Volume via RabbitMQ, Cinder-Volume
to Storage backend.

I would recommend you looking at the specific attachment_delete request
that failed in Cinder logs and see how long it took to complete, and
then check how long it took for the 504 error to happen.  With that info
you can get an idea of how much higher your timeout must be.

It could also happen that the Cinder-API raises a timeout error when
calling the Cinder-Volume.  In this case you should check the
cinder-volume service to see how long it took it to complete, as the
operation continues.

Internally the Cinder-API to Cinder-Volume timeout is usually around 60
seconds (rpc_response_timeout).

You need to ensure that your HAProxy and Cinder RPC timeouts are in sync
and are enough for the operation to complete on the worst case scenario.

Cheers,
Gorka.
...
I remember I launch evacuation on the host.
Perhaps it's over-loading but it's not happening on the cinder. Because the
environment is 3 all-in-one installation model.
That means control+compute per node, and 3 nodes become controller HA.
When I shutdown one of the node, I found all requests from API is pretty
slow (can feel that when using dashboard.)
And back to normal again when the node is back.
I'll try do the evacuation again but with just disable nova host or stop
nova services, to test if that happen again or not.
Matt Riedemann <mriedemos@gmail.com> 於 2019年7月23日 週二 上午6:40寫道：
...
On 7/18/2019 3:53 AM, Eddie Yen wrote:
...
Before I try to evacuate host, the source host had about 24 VMs running.
When I shutdown the node and execute evacuation, there're few VMs
failed. The error code is 504.
Strange is those VMs are all attach its own volume.
Then I check nova-compute log, a detailed error has pasted at below link;
https://pastebin.com/uaE7YrP1
Does anyone have any experience with this? I googled but no enough
information about this.
Are there errors in the cinder-api logs during the evacuate of all VMs
from the host? Are you doing the evacuate operation on all VMs on the
host concurrently or in serial? I wonder if you're over-loading cinder
and that's causing the timeout somehow. The timeout from cinder is when
deleting volume attachment records, which would be terminating
connections in the storage backend under the covers. Check the
cinder-volume logs for errors as well.
--
Thanks,
Matt

Re: [kolla][nova][cinder] Got Gateway-Timeout error on VM evacuation if it has volume attached.

Gorka Eguileor