Cinder API timeout on single-control node

Tony Liu tonyliu0592 at hotmail.com
Thu Sep 9 16:39:14 UTC 2021


The first issue is 504 timeout, update timeout in haproxy helps on that.
The next is the timeout from cinder-api, [1] helps on that.
Then the next is rbd client timeout. I started "debug RBD timeout issue" thread for that.
It seems that the root cause of this series timeout is from Ceph.
I followed comments from Konstantin to use msgr2 only.
Hopefully that will fix the whole timeout issue.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1930806


Thanks!
Tony
________________________________________
From: Eugen Block <eblock at nde.ag>
Sent: September 1, 2021 05:02 AM
To: openstack-discuss at lists.openstack.org
Subject: Cinder API timeout on single-control node

Hi *,

since your last responses were quite helpful regarding rabbitmq I
would like to ask a different question for the same environment. It's
an older openstack version (Pike) running with only one control node.
There already were lots of volumes (way more than 1000) in that cloud,
but after adding a bunch more (not sure how many exactly) in one
project the whole cinder api became extremely slow. Both horizon and
CLI run into timeouts:

[Wed Sep 01 13:18:52.109178 2021] [wsgi:error] [pid 60440] [client
<IP>:58474] Timeout when reading response headers from daemon process
'horizon':
/srv/www/openstack-dashboard/openstack_dashboard/wsgi/django.wsgi,
referer: http://<control>/project/volumes/
[Wed Sep 01 13:18:53.664714 2021] [wsgi:error] [pid 13007] Not Found:
/favicon.ico

Sometimes the volume creation succeeds if you just retry, but it often
fails. The dashboard shows a "504 gateway timeout" after two minutes
(also after four minutes after I increased the timeout for the apache
dashboard config).

The timeout also shows even if I try to get into the volumes tab of an
empty project.

A couple of weeks ago I already noticed some performance issues with
cinder api if there are lots of attached volumes, if there are many
"available" volumes it doesn't seem to slow things down. But since
then the total number of volumes has doubled. At the moment there are
more than 960 attached volumes across all projects and more than 750
detached volumes. I searched the cinder.conf for any helpful setting
but I'm not sure which would actually help. And since it's a
production cloud I would like to avoid restarting services all the
time just to try something. Maybe some of you can point me in the
right direction? I would appreciate any help!

If there's more information I can provide just let me know.

Thanks!
Eugen






More information about the openstack-discuss mailing list