OpenStack Ansible Service troubleshooting

John Ratliff jdratlif at globalnoc.iu.edu
Tue Oct 4 15:45:45 UTC 2022


We've started deploying new Xena clusters with openstack-ansible. We
keep running into problems with some parts of openstack not working. A
service will fail or need restarted, but it's not clear which one or
why.

Recently, one of our test clusters (2 hosts) stopped working. I could
login to horizon, but I could not create instances.

At first it told me that a message wasn't answered quick enough. I
assumed the problem was rabbitmq and restarted the container, but this
didn't help. I eventually restarted every container and the nova-
compute and haproxy services on the host. But this didn't help either.
I eventually rebooted both hosts, but this made things worse (I think I
broke the galera cluster doing this).

After bootstrapping the galera cluster, I can log back into horizon,
but I still cannot create hosts. It tells me

"Exceeded maximum number of retries. Exhausted all hosts available for
retrying build failures for instance [UUID]"

If I look at the journal for nova-compute, I see this error:

"libvirt.libvirtError: Failed to activate service
'org.freedesktop.machine1': timed out "

Looking at systemd-machined, it won't start due to "systemd-
machined.service: Job systemd-machined.service/start failed with result
'dependency'."

I'm not sure what "dependency" it's referring to. In the cluster that
does work, this service is running. But on both hosts on the cluster
that do not, this service is not running.

What should I be looking at here to fix?

-- 
John Ratliff
Systems Automation Engineer 
GlobalNOC @ Indiana University
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5598 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20221004/fbb4e4ec/attachment.bin>


More information about the openstack-discuss mailing list