OpenStack Ansible Service troubleshooting

Dmitriy Rabotyagov noonedeadpunk at gmail.com
Tue Oct 4 16:21:13 UTC 2022


Hi John.

Well, it seems you've made a bunch of operations that were not
required in the first place. However, I believe that at the end you've
identified the problem correctly. systemd-machined service should be
active and running on nova-compute hosts with kvm driver.
I'd suggest looking deeper at why this service systemd-machined can't
be started. What does journalctl says about that?

As one of dependency systemd-machined requires to have
/var/lib/machines. And I do have 2 assumptions there:
1. Was systemd-tmpfiles-setup.service activated? As we have seen
sometimes that upon node boot due to some race condition it was not,
which resulted in all kind of weirdness
2. Don't you happen to run nova-compute on the same set of hosts where
LXC containers are placed? As for example, in AIO setup we do manage
/var/lib/machines/ mount with systemd var-lib-machines.mount. So if
you happen to run nova-computes on controller host or AIO - this is
another thing to check.

вт, 4 окт. 2022 г. в 17:48, John Ratliff <jdratlif at globalnoc.iu.edu>:
>
> We've started deploying new Xena clusters with openstack-ansible. We
> keep running into problems with some parts of openstack not working. A
> service will fail or need restarted, but it's not clear which one or
> why.
>
> Recently, one of our test clusters (2 hosts) stopped working. I could
> login to horizon, but I could not create instances.
>
> At first it told me that a message wasn't answered quick enough. I
> assumed the problem was rabbitmq and restarted the container, but this
> didn't help. I eventually restarted every container and the nova-
> compute and haproxy services on the host. But this didn't help either.
> I eventually rebooted both hosts, but this made things worse (I think I
> broke the galera cluster doing this).
>
> After bootstrapping the galera cluster, I can log back into horizon,
> but I still cannot create hosts. It tells me
>
> "Exceeded maximum number of retries. Exhausted all hosts available for
> retrying build failures for instance [UUID]"
>
> If I look at the journal for nova-compute, I see this error:
>
> "libvirt.libvirtError: Failed to activate service
> 'org.freedesktop.machine1': timed out "
>
> Looking at systemd-machined, it won't start due to "systemd-
> machined.service: Job systemd-machined.service/start failed with result
> 'dependency'."
>
> I'm not sure what "dependency" it's referring to. In the cluster that
> does work, this service is running. But on both hosts on the cluster
> that do not, this service is not running.
>
> What should I be looking at here to fix?
>
> --
> John Ratliff
> Systems Automation Engineer
> GlobalNOC @ Indiana University



More information about the openstack-discuss mailing list