On Fri, Dec 22, 2023 at 9:19 AM KEREM CELIKER <kmceliker@gmail.com> wrote:
it seems that there are a few possible causes and solutions for this issue. Here are some of them:

- One cause could be that the cinder-volume service on compute2@rbd-1 is not properly configured or synchronized with the cinder-scheduler service on controller@rbd-1. You can try to check the configuration files and logs of both services to see if there are any errors or inconsistencies. You can also try to restart both services and see if that fixes the problem.

I think "synchronized" is the key word. I believe the bad node is properly configured, but its time is not in sync with the other nodes. The "openstack volume service list" output shows the bad node to be 2 minutes behind the others, which is enough for the scheduler to mark the service down.

Alan

- Another cause could be that the cinder-volume service on compute2@rbd-1 is not compatible with the cinder-backup service on the same node. You can try to disable or remove the cinder-backup service and see if that makes any difference. You can also try to use a different backend storage for your volumes and backups, such as CephFS or GlusterFS.

- A third cause could be that the cinder-volume service on compute2@rbd-1 is affected by some external factors, such as network issues, power outage, or hardware failure.

You can try to ping and ssh into the node and see if it is reachable and responsive. You can also check the status of other OpenStack services on the node, such as Nova, Neutron, Heat, etc., to see if they are working properly.

Kerem Çeliker
tr.linkedin.com/in/keremceliker