[kolla][nova][cinder] Got Gateway-Timeout error on VM evacuation if it has volume attached.

Gorka Eguileor geguileo at redhat.com
Thu Jul 25 08:14:07 UTC 2019


On 23/07, Eddie Yen wrote:
> Hi Matt, thanks for your reply first.
>
> The log I paste is from nova-compute.
> And I also check cinder-api & cinder-volume logs according from timestamp.
> Strange is, no error messages found during that time.

Hi,

It could make sense that you see no errors in Cinder.  The error from
your pastebin is not coming from Cinder, it is coming from your HAProxy
(or whatever load balancer you have in front of the Cinder-API nodes).

Attachment delete is a synchronous operation, so all the different
connection timeouts may affect the operation: Nova to HAProxy, HAProxy
to Cinder-API, Cinder-API to Cinder-Volume via RabbitMQ, Cinder-Volume
to Storage backend.

I would recommend you looking at the specific attachment_delete request
that failed in Cinder logs and see how long it took to complete, and
then check how long it took for the 504 error to happen.  With that info
you can get an idea of how much higher your timeout must be.

It could also happen that the Cinder-API raises a timeout error when
calling the Cinder-Volume.  In this case you should check the
cinder-volume service to see how long it took it to complete, as the
operation continues.

Internally the Cinder-API to Cinder-Volume timeout is usually around 60
seconds (rpc_response_timeout).

You need to ensure that your HAProxy and Cinder RPC timeouts are in sync
and are enough for the operation to complete on the worst case scenario.

Cheers,
Gorka.

> I remember I launch evacuation on the host.
>
> Perhaps it's over-loading but it's not happening on the cinder. Because the
> environment is 3 all-in-one installation model.
> That means control+compute per node, and 3 nodes become controller HA.
> When I shutdown one of the node, I found all requests from API is pretty
> slow (can feel that when using dashboard.)
> And back to normal again when the node is back.
>
> I'll try do the evacuation again but with just disable nova host or stop
> nova services, to test if that happen again or not.
>
> Matt Riedemann <mriedemos at gmail.com> 於 2019年7月23日 週二 上午6:40寫道:
>
> > On 7/18/2019 3:53 AM, Eddie Yen wrote:
> > > Before I try to evacuate host, the source host had about 24 VMs running.
> > > When I shutdown the node and execute evacuation, there're few VMs
> > > failed. The error code is 504.
> > > Strange is those VMs are all attach its own volume.
> > >
> > > Then I check nova-compute log, a detailed error has pasted at below link;
> > > https://pastebin.com/uaE7YrP1
> > >
> > > Does anyone have any experience with this? I googled but no enough
> > > information about this.
> >
> > Are there errors in the cinder-api logs during the evacuate of all VMs
> > from the host? Are you doing the evacuate operation on all VMs on the
> > host concurrently or in serial? I wonder if you're over-loading cinder
> > and that's causing the timeout somehow. The timeout from cinder is when
> > deleting volume attachment records, which would be terminating
> > connections in the storage backend under the covers. Check the
> > cinder-volume logs for errors as well.
> >
> > --
> >
> > Thanks,
> >
> > Matt
> >
> >



More information about the openstack-discuss mailing list