On Fri, Sep 27, 2019 at 07:15:47AM +0000, Manuel Sopena Ballesteros wrote:
Dear Openstack user community,
Hi, Manuel,
I have a compute node with 2 numa nodes and I would like to create 2 vms, each one using a different numa node through numa affinity with cpu, memory and nvme pci devices.
[...]
2019-09-27 16:45:19.785 7 ERROR nova.compute.manager [req-b5a25c73-8c7d-466c-8128-71f29e7ae8aa 91e83343e9834c8ba0172ff369c8acac b91520cff5bd45c59a8de07c38641582 - default default] [instance: ebe4e78c-501e-4535-ae15-948301cbf1ae] Instance failed to spawn: libvirtError: internal error: qemu unexpectedly closed the monitor: 2019-09-27T06:45:19.118089Z qemu-kvm: kvm_init_vcpu failed: Cannot allocate memory
[...] This is a known issue. (Eerily enough, I've been debugging this issue the last couple of days.) tl;dr - Using Linux kernel 4.19 or above (with the below commit commit) should fix this. If using 4.19 kernel is not possible, ask your Linux vendor to backport this small fix: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... It's an absolutely valid requst. It would be great if you can confirm. (Also, can you please file a formal Nova bug here? -- https://bugs.launchpad.net/nova) Long (and complex) story ------------------------ [The root cause is a complex interaction between libvirt, QEMU/KVM, CGroups, and the kernel. I myself don't understand some of the CGroups interaction.] Today, Nova hard-codes the 'strict' memory allocation mode (there's no way to configure it in Nova) when tuning NUMA config: <numatune> <memory mode='strict' nodeset='1'/> </numatune> Where 'strict' means, libvirt must prevent QEMU/KVM from allocating memory all other nodes, _except_ for node-1. The consequence of that is when QEMU initializes, KVM needs to allocate some memory from the "DMA32" zone (it is one of the "zones" into which the kernel divides the system memory into). If that DMA32 zone is _not_ present on node-1, then memory allocation fails and in turn the VM boot fails to start with: "kvm_init_vcpu failed: Cannot allocate memory". - - - So, if using upstream kernel 4.19 (or a vendor-specific kernel that doesn't have the backport fix), then an alternative is to make Nova use the 'preferred' mode which relaxes the 'strict' + "DMA32 zone must present" requirement. See the WIP patch here: https://review.opendev.org/#/c/684375/ -- "libvirt: Use the `preferred` memory allocation mode for NUMA" Where 'preferred' means: disable NUMA affinity; and turn the memory allocation request into a "hint", i.e. "if possible, allocate from the given node-1; otherwise, fallback to other NUMA nodes". Additional info --------------- (*) For the kernel fix mentioned earlier, see the exact same problem reported here: https://lkml.org/lkml/2018/7/24/843 -- VM boot failure on nodes not having DMA32 zone. (*) My investigation over the last two days uncovered a longer libvirt story here with regards to memory allocation and honoring NUMA config. But I won't get into it here for brevity's sake. If you're interested, just ask, I can point to the relevant libvirt Git history and mailing list posts. [...]
NOTE: this is to show that numa node/cell 1 has enough resources available (also nova-compute logs shows that kudu-4 is assigned to cell 1)
As you have guessed, the problem is _not_ that "there is not enough memory", but that the guest's memory is not allocated on the _correct_ NUMA node with "DMA32" region. Can you also get: - The versions of your host kernel, libvirt and QEMU - The output of: `grep DMA /proc/zoneinfo` (I am almost certain that in your output only one of the two nodes has "DMA32" region.) [...]
What "emu-kvm: kvm_init_vcpu failed: Cannot allocate memory" means in this context?
Hope my earlier explanation answers it, even if not entirely satisfactory :-) -- /kashyap