[nova] nova hypervisor oom killed some openstack guest

hai wu haiwu.us at gmail.com
Mon Jul 25 18:11:29 UTC 2022


Understand. The same concern is also raised in the following redhat
KB: https://access.redhat.com/solutions/4670201.

But we could also protect some critical openstack services, like
neutron, libvirtd, via the same way by setting OOMScoreAdjust for
those to be -1000. If we do that, we should probably be ok. We protect
both critical openstack services, and all openstack VMs in this way.

On Thu, Jul 21, 2022 at 6:42 AM Sean Mooney <smooney at redhat.com> wrote:
>
> On Wed, 2022-07-20 at 20:25 -0500, hai wu wrote:
> > You are correct, there's no way to set OOMScoreAdjust for
> > machine.slice. It errored out when trying to do that, with "Unknown
> > assignment" error..
>
> if you mess with the cgroups behind novas back then any hope of support you have with
> your vendor or updstream is gone.
>
> you shoudl really find out why your running out of memroy.
>
> it ususllay means you have not configured nova and the host correctly.
>
> most often this hapens becuase peopel use cpu pinning wiht out enable per
> numa node memory memory tracking by setting a  page size.
>
> it also could be because you have not allcoated enough swap.
>
> so before you try to adjust things with cgroups yourslef or explore other options you shoudl determin why
> the host is runnign out of memroy.
>
> if you prevent ti from kill the gues i have see it kill ovs or nova iteslf before where the guest were
> unkillable or unlkely to be killed because they used hugepages.
>
> so you will likely jsut shift the problem else where that will be more impactful.
>
> >
> > On Wed, Jul 20, 2022 at 6:48 PM hai wu <haiwu.us at gmail.com> wrote:
> > >
> > > In this case there's no memory oversubscription. This oom killer event
> > > happened when we did "swapoff -a; swapon -a" to push processes in swap
> > > back to memory, which is very strange.
> > >
> > > On Wed, Jul 20, 2022 at 6:39 PM Clark Boylan <cboylan at sapwetik.org> wrote:
> > > >
> > > > On Wed, Jul 20, 2022, at 4:04 PM, hai wu wrote:
> > > > > After installing some systemd package, and starting up machine.slice,
> > > > > systemd-machined, and hard rebooting the vm from openstack side, I
> > > > > could now see the VM showing up under machine.slice. all vms were
> > > > > showing up under libvirtd.service, which is under system.slice.
> > > > >
> > > > > What are the benefits of running libvirt managed guest instances under
> > > > > machine.slice?
> > > >
> > > > You can use machine.slice to set system resource options that each sub slice inherits. Those options are documented at https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html# (per my earlier link https://www.freedesktop.org/software/systemd/man/systemd.slice.html). I don't see OOMScoreAdjust listed there so I am unsure if you can actually set it via this method.
> > > >
> > > > That all said, if you are oversubscribing memory this is likely to always be an issue. If you adjust the oom score for your VMs then the oomkiller is just going to find other victims to kill. Losing your nova compute agent or NetworkManager or iscsid may be just as problematic. Instead, I suspect that you may need to stop oversubscribing memory.
> > > >
> > > > >
> > > > > On Wed, Jul 20, 2022 at 5:53 PM Clark Boylan <cboylan at sapwetik.org> wrote:
> > > > > >
> > > > > > On Wed, Jul 20, 2022, at 3:17 PM, hai wu wrote:
> > > > > > > Is there any configuration file that is needed to ensure guest domains
> > > > > > > are under systemd machine.slice? not seeing anything under
> > > > > > > machine.slice ..
> > > > > >
> > > > > > I think that https://www.freedesktop.org/software/systemd/man/systemd.slice.html and https://libvirt.org/cgroups.html covers this for libvirt managed VMs.
> > > > > >
> > > > > > >
> > > > > > > On Wed, Jul 20, 2022 at 3:33 PM Dmitriy Rabotyagov
> > > > > > > <noonedeadpunk at gmail.com> wrote:
> > > > > > > >
> > > > > > > > I believe you can decrease OOMScoreAdjust for systemd machines.slice, under which guest domains are to reduce chances of oom killing them.
> > > > > > > >
> > > > > > > > ср, 20 июл. 2022 г., 21:52 hai wu <haiwu.us at gmail.com>:
> > > > > > > > >
> > > > > > > > > nova hypervisor sometimes oom would kill some openstack guests.
> > > > > > > > >
> > > > > > > > > Is it possible to not allow kernel to oom kill any openstack guests?
> > > > > > > > > ram is not oversubscribed much ..
> > > > > > > > >
> > > > > >
> > > >
> >
>



More information about the openstack-discuss mailing list