[nova] nova hypervisor oom killed some openstack guest

Sean Mooney smooney at redhat.com
Mon Jul 25 23:00:35 UTC 2022


On Mon, 2022-07-25 at 18:06 -0400, Laurent Dumont wrote:
> How much are you reserving for Openstack vs the VM?
that is  a very good question many people fail to account for the qemu
overhead and fail to allocate swap.
even if you are not using memory over subscripion you should ahve 8-16GB fo swap
on any nova compute host.

in addtion to how much is being reserved its also imporant to ensure
taht if you are doing memory over subscrtion that there is enough swap to cover that
and to understand that the kernel oom reaper runs per numa node so even if there is plent
of free memory on numa 1 if the kernel need memory on numa 0 then it will trigger an OOM reaping
cycle.

so if you are using hugepages its imporant to ensure that you still have enough memory one all numa nodes where
kernel proceess can run.
> 
> On Mon, Jul 25, 2022 at 2:19 PM hai wu <haiwu.us at gmail.com> wrote:
> 
> > Understand. The same concern is also raised in the following redhat
> > KB: https://access.redhat.com/solutions/4670201.
just be aware that ^ is not something that is supproted in the
redhat openstack product and implementing it woudl void your support
for the vms.
knolwadge base articals are generally writen by support engineers
when debugging a problem with possibel solutions they tried.

The are not part of our product docs, are not review for correctness
by the engineri teams that maintain openstack upstream or downstream.
so take anything you find there with a grain of salt.

libvirt hooks are not and never have been supported upstream or downstream.
but if you are maintaining and or operating the cloud your self then that might work for you.
> > 
> > But we could also protect some critical openstack services, like
> > neutron, libvirtd, via the same way by setting OOMScoreAdjust for
> > those to be -1000. If we do that, we should probably be ok. We protect
> > both critical openstack services, and all openstack VMs in this way.
> > 
> > On Thu, Jul 21, 2022 at 6:42 AM Sean Mooney <smooney at redhat.com> wrote:
> > > 
> > > On Wed, 2022-07-20 at 20:25 -0500, hai wu wrote:
> > > > You are correct, there's no way to set OOMScoreAdjust for
> > > > machine.slice. It errored out when trying to do that, with "Unknown
> > > > assignment" error..
> > > 
> > > if you mess with the cgroups behind novas back then any hope of support
> > you have with
> > > your vendor or updstream is gone.
> > > 
> > > you shoudl really find out why your running out of memroy.
> > > 
> > > it ususllay means you have not configured nova and the host correctly.
> > > 
> > > most often this hapens becuase peopel use cpu pinning wiht out enable per
> > > numa node memory memory tracking by setting a  page size.
> > > 
> > > it also could be because you have not allcoated enough swap.
> > > 
> > > so before you try to adjust things with cgroups yourslef or explore
> > other options you shoudl determin why
> > > the host is runnign out of memroy.
> > > 
> > > if you prevent ti from kill the gues i have see it kill ovs or nova
> > iteslf before where the guest were
> > > unkillable or unlkely to be killed because they used hugepages.
> > > 
> > > so you will likely jsut shift the problem else where that will be more
> > impactful.
> > > 
> > > > 
> > > > On Wed, Jul 20, 2022 at 6:48 PM hai wu <haiwu.us at gmail.com> wrote:
> > > > > 
> > > > > In this case there's no memory oversubscription. This oom killer
> > event
> > > > > happened when we did "swapoff -a; swapon -a" to push processes in
> > swap
> > > > > back to memory, which is very strange.
> > > > > 
> > > > > On Wed, Jul 20, 2022 at 6:39 PM Clark Boylan <cboylan at sapwetik.org>
> > wrote:
> > > > > > 
> > > > > > On Wed, Jul 20, 2022, at 4:04 PM, hai wu wrote:
> > > > > > > After installing some systemd package, and starting up
> > machine.slice,
> > > > > > > systemd-machined, and hard rebooting the vm from openstack side,
> > I
> > > > > > > could now see the VM showing up under machine.slice. all vms were
> > > > > > > showing up under libvirtd.service, which is under system.slice.
> > > > > > > 
> > > > > > > What are the benefits of running libvirt managed guest instances
> > under
> > > > > > > machine.slice?
> > > > > > 
> > > > > > You can use machine.slice to set system resource options that each
> > sub slice inherits. Those options are documented at
> > https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#
> > (per my earlier link
> > https://www.freedesktop.org/software/systemd/man/systemd.slice.html). I
> > don't see OOMScoreAdjust listed there so I am unsure if you can actually
> > set it via this method.
> > > > > > 
> > > > > > That all said, if you are oversubscribing memory this is likely to
> > always be an issue. If you adjust the oom score for your VMs then the
> > oomkiller is just going to find other victims to kill. Losing your nova
> > compute agent or NetworkManager or iscsid may be just as problematic.
> > Instead, I suspect that you may need to stop oversubscribing memory.
> > > > > > 
> > > > > > > 
> > > > > > > On Wed, Jul 20, 2022 at 5:53 PM Clark Boylan <
> > cboylan at sapwetik.org> wrote:
> > > > > > > > 
> > > > > > > > On Wed, Jul 20, 2022, at 3:17 PM, hai wu wrote:
> > > > > > > > > Is there any configuration file that is needed to ensure
> > guest domains
> > > > > > > > > are under systemd machine.slice? not seeing anything under
> > > > > > > > > machine.slice ..
> > > > > > > > 
> > > > > > > > I think that
> > https://www.freedesktop.org/software/systemd/man/systemd.slice.html and
> > https://libvirt.org/cgroups.html covers this for libvirt managed VMs.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On Wed, Jul 20, 2022 at 3:33 PM Dmitriy Rabotyagov
> > > > > > > > > <noonedeadpunk at gmail.com> wrote:
> > > > > > > > > > 
> > > > > > > > > > I believe you can decrease OOMScoreAdjust for systemd
> > machines.slice, under which guest domains are to reduce chances of oom
> > killing them.
> > > > > > > > > > 
> > > > > > > > > > ср, 20 июл. 2022 г., 21:52 hai wu <haiwu.us at gmail.com>:
> > > > > > > > > > > 
> > > > > > > > > > > nova hypervisor sometimes oom would kill some openstack
> > guests.
> > > > > > > > > > > 
> > > > > > > > > > > Is it possible to not allow kernel to oom kill any
> > openstack guests?
> > > > > > > > > > > ram is not oversubscribed much ..
> > > > > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > 
> > > 
> > 
> > 




More information about the openstack-discuss mailing list