Hi everyone, I've been debugging an issue for several weeks and have exhausted the obvious possibilities. I'm hoping the collective expertise here can point me in the right direction. ## Environment - **OpenStack**: Kolla-Ansible 2025.1 (Epoxy), all-in-one deployment - **OS**: Ubuntu 24.04 on a VM (nested virtualization) - **OKD target**: Trying to deploy OKD 4.18, but even a simple Cirros test VM fails ## The Problem Any attempt to create a VM (even `--image cirros --flavor m1.tiny`) fails with: ``` libvirt.libvirtError: cannot fork child process: Resource temporarily unavailable ``` The VM goes straight to `ERROR` state with no useful details in `nova-compute.log` beyond the same error. ## What Works ✅ - OpenStack services are up (`nova-compute`, `neutron`, etc. all healthy) - Can create networks, subnets, routers, keypairs via CLI - Direct `qemu-system-x86_64 -enable-kvm` test succeeds (KVM itself works) - `/dev/kvm` exists with correct permissions - KVM modules loaded (`kvm_intel`), nested virt enabled (`Y`) ## What We've Checked (All Fine) | Check | What We Found | Status | |-------|---------------|--------| | System PID limit (`pid_max`) | 4,194,304 | ✅ OK | | Kernel threads max (`threads-max`) | 722,077 | ✅ OK | | User process limit (`ulimit -u`) | 361,038 | ✅ OK | | Host thread count (`ps -eLf \| wc -l`) | ~111,000 | ✅ OK | | `nova_compute` container `pids.max` | 108,000 | ✅ OK | | `nova_compute` PIDs in container | 27 | ✅ OK | | `nova_libvirt` container `PidsLimit` | `<nil>` (unlimited) | ✅ OK | | AppArmor blocking libvirt | No profiles loaded | ✅ OK | | Host `libvirtd` running | Inactive | ✅ OK | | Libvirt log volume size | 12KB | ✅ OK | | RAM available | 88 GB total, plenty free | ✅ OK | | vCPUs | 32 cores | ✅ OK | ## What We Can't Check (Missing Commands) Inside the `nova_libvirt` container, the `ulimit` command is missing, so we cannot determine: - `nproc` limit inside the container - `nofile` (file descriptor) limit ## What We Haven't Checked - Kernel parameters like `vm.max_map_count` (default is 65530, could this be an issue?) - cgroup v2 limits on the `nova_libvirt` container (Ubuntu 24.04 uses cgroup v2) - `libvirtd` logs inside the container (the log file may not exist or be empty) ## The Ask Has anyone seen this in a Kolla-Ansible all-in-one deployment where all the obvious limits are large but libvirt still refuses to fork? Could it be: 1. A cgroup v2 limit we missed? 2. A kernel parameter that needs tuning (`vm.max_map_count`)? 3. Something else entirely? I'm happy to run any additional diagnostics or provide more logs. Any guidance would be hugely appreciated. Thanks, Dennis