Cinder API timeout on single-control node

1 Sep 2021

      Hi *,

since your last responses were quite helpful regarding rabbitmq I  
would like to ask a different question for the same environment. It's  
an older openstack version (Pike) running with only one control node.  
There already were lots of volumes (way more than 1000) in that cloud,  
but after adding a bunch more (not sure how many exactly) in one  
project the whole cinder api became extremely slow. Both horizon and  
CLI run into timeouts:

[Wed Sep 01 13:18:52.109178 2021] [wsgi:error] [pid 60440] [client  
<IP>:58474] Timeout when reading response headers from daemon process  
'horizon':  
/srv/www/openstack-dashboard/openstack_dashboard/wsgi/django.wsgi,  
referer: http://<control>/project/volumes/
[Wed Sep 01 13:18:53.664714 2021] [wsgi:error] [pid 13007] Not Found:  
/favicon.ico

Sometimes the volume creation succeeds if you just retry, but it often  
fails. The dashboard shows a "504 gateway timeout" after two minutes  
(also after four minutes after I increased the timeout for the apache  
dashboard config).

The timeout also shows even if I try to get into the volumes tab of an  
empty project.

A couple of weeks ago I already noticed some performance issues with  
cinder api if there are lots of attached volumes, if there are many  
"available" volumes it doesn't seem to slow things down. But since  
then the total number of volumes has doubled. At the moment there are  
more than 960 attached volumes across all projects and more than 750  
detached volumes. I searched the cinder.conf for any helpful setting  
but I'm not sure which would actually help. And since it's a  
production cloud I would like to avoid restarting services all the  
time just to try something. Maybe some of you can point me in the  
right direction? I would appreciate any help!

If there's more information I can provide just let me know.

Thanks!
Eugen

Eugen Block

Tony Liu

Eugen Block

tags

participants (2)