[kolla][cinder] cinder containers api, volume, backup unhealthy
Hi, after replacing my control nodes with new nodes (all bare-metal) somehow the cinder containers are no longer starting. I checked the logs on one of the control nodes and I see this in the api-eror.log: 2023-05-30 21:31:36.350636 Timeout when reading response headers from daemon process 'cinder-api': /var/www/cgi-bin/cinder/cinder-wsgi 2023-05-30 21:31:37.827101 mod_wsgi (pid=22): Failed to exec Python script file '/var/www/cgi-bin/cinder/cinder-wsgi'. 2023-05-30 21:31:37.827168 mod_wsgi (pid=22): Exception occurred processing WSGI script '/var/www/cgi-bin/cinder/cinder-wsgi'. 2023-05-30 21:31:37.828005 Traceback (most recent call last): 2023-05-30 21:31:37.828046 File "/var/www/cgi-bin/cinder/cinder-wsgi", line 52, in <module> 2023-05-30 21:31:37.828053 application = initialize_application() 2023-05-30 21:31:37.828058 File "/var/lib/kolla/venv/lib/python3.6/site-packages/cinder/wsgi/wsgi.py", line 44, in initialize_application 2023-05-30 21:31:37.828063 coordination.COORDINATOR.start() 2023-05-30 21:31:37.828068 File "/var/lib/kolla/venv/lib/python3.6/site-packages/cinder/coordination.py", line 86, in start 2023-05-30 21:31:37.828071 self.coordinator.start(start_heart=True) 2023-05-30 21:31:37.828075 File "/var/lib/kolla/venv/lib/python3.6/site-packages/tooz/coordination.py", line 689, in start 2023-05-30 21:31:37.828078 super(CoordinationDriverWithExecutor, self).start(start_heart) 2023-05-30 21:31:37.828083 File "/var/lib/kolla/venv/lib/python3.6/site-packages/tooz/coordination.py", line 426, in start 2023-05-30 21:31:37.828086 self._start() 2023-05-30 21:31:37.828090 File "/var/lib/kolla/venv/lib/python3.6/site-packages/tooz/drivers/etcd3gw.py", line 224, in _start 2023-05-30 21:31:37.828093 self._membership_lease = self.client.lease(self.membership_timeout) 2023-05-30 21:31:37.828098 File "/var/lib/kolla/venv/lib/python3.6/site-packages/etcd3gw/client.py", line 122, in lease 2023-05-30 21:31:37.828111 json={"TTL": ttl, "ID": 0}) 2023-05-30 21:31:37.828116 File "/var/lib/kolla/venv/lib/python3.6/site-packages/etcd3gw/client.py", line 88, in post 2023-05-30 21:31:37.828123 resp.reason 2023-05-30 21:31:37.828154 etcd3gw.exceptions.ConnectionTimeoutError: Gateway Time-out All other containers are working just fine. Even the cinder_scheduler container works fine. So far I have tried the following: remove the cinder containers including its volume from one control node mariadb_recovery Reboot all control nodes. kolla-ansible reconfigure --tags cinder,nova Any help is highly appreciated. Cheers, Oliver
Hi Oliver, Looking at the output - the problem is with connection to etcd coordination backend. Best regards, Michal
On 31 May 2023, at 07:07, Oliver Weinmann <oliver.weinmann@me.com> wrote:
Hi,
after replacing my control nodes with new nodes (all bare-metal) somehow the cinder containers are no longer starting.
I checked the logs on one of the control nodes and I see this in the api-eror.log:
2023-05-30 21:31:36.350636 Timeout when reading response headers from daemon process 'cinder-api': /var/www/cgi-bin/cinder/cinder-wsgi 2023-05-30 21:31:37.827101 mod_wsgi (pid=22): Failed to exec Python script file '/var/www/cgi-bin/cinder/cinder-wsgi'. 2023-05-30 21:31:37.827168 mod_wsgi (pid=22): Exception occurred processing WSGI script '/var/www/cgi-bin/cinder/cinder-wsgi'. 2023-05-30 21:31:37.828005 Traceback (most recent call last): 2023-05-30 21:31:37.828046 File "/var/www/cgi-bin/cinder/cinder-wsgi", line 52, in <module> 2023-05-30 21:31:37.828053 application = initialize_application() 2023-05-30 21:31:37.828058 File "/var/lib/kolla/venv/lib/python3.6/site-packages/cinder/wsgi/wsgi.py", line 44, in initialize_application 2023-05-30 21:31:37.828063 coordination.COORDINATOR.start() 2023-05-30 21:31:37.828068 File "/var/lib/kolla/venv/lib/python3.6/site-packages/cinder/coordination.py", line 86, in start 2023-05-30 21:31:37.828071 self.coordinator.start(start_heart=True) 2023-05-30 21:31:37.828075 File "/var/lib/kolla/venv/lib/python3.6/site-packages/tooz/coordination.py", line 689, in start 2023-05-30 21:31:37.828078 super(CoordinationDriverWithExecutor, self).start(start_heart) 2023-05-30 21:31:37.828083 File "/var/lib/kolla/venv/lib/python3.6/site-packages/tooz/coordination.py", line 426, in start 2023-05-30 21:31:37.828086 self._start() 2023-05-30 21:31:37.828090 File "/var/lib/kolla/venv/lib/python3.6/site-packages/tooz/drivers/etcd3gw.py", line 224, in _start 2023-05-30 21:31:37.828093 self._membership_lease = self.client.lease(self.membership_timeout) 2023-05-30 21:31:37.828098 File "/var/lib/kolla/venv/lib/python3.6/site-packages/etcd3gw/client.py", line 122, in lease 2023-05-30 21:31:37.828111 json={"TTL": ttl, "ID": 0}) 2023-05-30 21:31:37.828116 File "/var/lib/kolla/venv/lib/python3.6/site-packages/etcd3gw/client.py", line 88, in post 2023-05-30 21:31:37.828123 resp.reason 2023-05-30 21:31:37.828154 etcd3gw.exceptions.ConnectionTimeoutError: Gateway Time-out
All other containers are working just fine. Even the cinder_scheduler container works fine.
So far I have tried the following:
remove the cinder containers including its volume from one control node
mariadb_recovery
Reboot all control nodes.
kolla-ansible reconfigure --tags cinder,nova
Any help is highly appreciated.
Cheers,
Oliver
participants (2)
-
Michał Nasiadka
-
Oliver Weinmann