<div dir="auto">Oh, well, I do recall now that package update could brake systemd mount, as in prior releases we placed our own systemd unit file in place and now we just leverage systemd overrides functionality [1].<div dir="auto">I think what you can do is find out what package does provide this mount file and mark it for hold. Or cherry-pick and apply mentioned change. </div><div dir="auto"><div dir="auto"><br></div><div dir="auto">[1] <a href="https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/834183">https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/834183</a></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">вт, 4 окт. 2022 г., 22:23 John Ratliff <<a href="mailto:jdratlif@globalnoc.iu.edu">jdratlif@globalnoc.iu.edu</a>>:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Tue, 2022-10-04 at 14:56 -0400, John Ratliff wrote:<br>

> On Tue, 2022-10-04 at 18:21 +0200, Dmitriy Rabotyagov wrote:<br>

> > Hi John.<br>

> > <br>

> > Well, it seems you've made a bunch of operations that were not<br>

> > required in the first place. However, I believe that at the end<br>

> > you've<br>

> > identified the problem correctly. systemd-machined service should<br>

> > be<br>

> > active and running on nova-compute hosts with kvm driver.<br>

> > I'd suggest looking deeper at why this service systemd-machined<br>

> > can't<br>

> > be started. What does journalctl says about that?<br>

> <br>

> It's not very chatty, though I think your next question might answer<br>

> the why.<br>

> <br>

> $ sudo journalctl -u systemd-machined<br>

> -- Logs begin at Tue 2022-10-04 17:45:02 UTC, end at Tue 2022-10-04<br>

> 18:43:45 UTC. --<br>

> Oct 04 18:43:37 os-comp1 systemd[1]: Dependency failed for Virtual<br>

> Machine and Container Registration Service.<br>

> Oct 04 18:43:37 os-comp1 systemd[1]: systemd-machined.service: Job<br>

> systemd-machined.service/start failed with result 'dependency'.<br>

> <br>

> > <br>

> > As one of dependency systemd-machined requires to have<br>

> > /var/lib/machines. And I do have 2 assumptions there:<br>

> > 1. Was systemd-tmpfiles-setup.service activated? As we have seen<br>

> > sometimes that upon node boot due to some race condition it was<br>

> > not,<br>

> > which resulted in all kind of weirdness<br>

> <br>

> It appears to be. The output looks very similar between the broken<br>

> and<br>

> working clusters.<br>

> <br>

> $ sudo systemctl status systemd-tmpfiles-setup                       <br>

> ● systemd-tmpfiles-setup.service - Create Volatile Files and<br>

> Directories<br>

>      Loaded: loaded (/lib/systemd/system/systemd-tmpfiles-<br>

> setup.service; static; vendor preset: enabled)<br>

>      Active: active (exited) since Mon 2022-10-03 18:23:53 UTC; 24h<br>

> ago<br>

>        Docs: man:tmpfiles.d(5)<br>

>              man:systemd-tmpfiles(8)<br>

>    Main PID: 1460 (code=exited, status=0/SUCCESS)<br>

>       Tasks: 0 (limit: 8192)<br>

>      Memory: 0B<br>

>      CGroup: /system.slice/systemd-tmpfiles-setup.service<br>

> <br>

> Warning: journal has been rotated since unit was started, output may<br>

> be<br>

> incomplete.<br>

> <br>

> However, /var/lib/machines does not appear to be correct. On the<br>

> working cluster, this is mounted as an ext4 filesystem and has a<br>

> lost+found directory along with a directory for a defined instance.<br>

> <br>

> There is no mount listed on the broken cluster, and the directory is<br>

> empty.<br>

> <br>

> > 2. Don't you happen to run nova-compute on the same set of hosts<br>

> > where<br>

> > LXC containers are placed? As for example, in AIO setup we do<br>

> > manage<br>

> > /var/lib/machines/ mount with systemd var-lib-machines.mount. So if<br>

> > you happen to run nova-computes on controller host or AIO - this is<br>

> > another thing to check.<br>

> <br>

> $ sudo journalctl -u var-lib-machines.mount<br>

> -- Logs begin at Tue 2022-10-04 18:01:46 UTC, end at Tue 2022-10-04<br>

> 18:52:53 UTC. --<br>

> Oct 04 18:43:37 os-comp1 systemd[1]: Mounting Virtual Machine and<br>

> Container Storage (Compatibility)...<br>

> Oct 04 18:43:37 os-comp1 mount[1272300]: mount: /var/lib/machines:<br>

> wrong fs type, bad option, bad superblock on /dev/loop0, missing<br>

> codepage or helper program, or other error.<br>

> Oct 04 18:43:37 os-comp1 systemd[1]: var-lib-machines.mount: Mount<br>

> process exited, code=exited, status=32/n/a<br>

> Oct 04 18:43:37 os-comp1 systemd[1]: var-lib-machines.mount: Failed<br>

> with result 'exit-code'.<br>

> Oct 04 18:43:37 os-comp1 systemd[1]: Failed to mount Virtual Machine<br>

> and Container Storage (Compatibility).<br>

> <br>

> This appears to be the problem. It looks like /dev/loop0 is probably<br>

> supposed to reference /var/lib/machines.raw. I tried running fsck on<br>

> /dev/loop0, but it doesn't think there is a valid extX filesystem on<br>

> any of the superblocks. Maybe /dev/loop0 is not really pointing to<br>

> /var/lib/machines.raw? Not sure how to tell if that's the case.<br>

> <br>

> Maybe I should try to loopback this, or create a blank filesystem<br>

> image.<br>

> <br>

> <br>

> <br>

<br>

Okay, I'm not sure what happened here.<br>

<br>

The systemd unit mount file for var-lib-machines is different on the<br>

broken cluster than the working cluster. It talks about a btrfs system,<br>

but the /var/lib/machines.raw file is an ext4 filesystem, like the one<br>

on the working cluster.<br>

<br>

I copied the unit file from the working cluster to the broken cluster,<br>

and I could mount /var/lib/machines, get systemd-machined working, and<br>

create machines now.<br>

<br>

I have no idea what happened. I feel like there must have been a system<br>

update that changed (reverted from openstack-ansible?) something, but<br>

I'm just not sure.<br>

<br>

In any event, you helped me figure it out. Thanks.<br>

<br>

-- <br>

John Ratliff<br>

Systems Automation Engineer <br>

GlobalNOC @ Indiana University<br>

</blockquote></div>