[openstack-dev] [nova] Running large instances with CPU pinning and OOM

Stephen Finucane sfinucan at redhat.com
Wed Sep 27 08:14:00 UTC 2017

On Mon, 2017-09-25 at 17:36 +0200, Jakub Jursa wrote:
> Hello everyone,
> We're experiencing issues with running large instances (~60GB RAM) on
> fairly large NUMA nodes (4 CPUs, 256GB RAM) while using cpu pinning. The
> problem is that it seems that in some extreme cases qemu/KVM can have
> significant memory overhead (10-15%?) which nova-compute service doesn't
> take in to the account when launching VMs. Using our configuration as an
> example - imagine running two VMs with 30GB RAM on one NUMA node
> (because we use cpu pinning) - therefore using 60GB out of 64GB for
> given NUMA domain. When both VMs would consume their entire memory
> (given 10% KVM overhead) OOM killer takes an action (despite having
> plenty of free RAM in other NUMA nodes). (the numbers are just
> arbitrary, the point is that nova-scheduler schedules the instance to
> run on the node because the memory seems 'free enough', but specific
> NUMA node can be lacking the memory reserve).
> Our initial solution was to use ram_allocation_ratio < 1 to ensure
> having some reserved memory - this didn't work. Upon studying source of
> nova, it turns out that ram_allocation_ratio is ignored when using cpu
> pinning. (see
> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L859
> and
> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L821
> ). We're running Mitaka, but this piece of code is implemented in Ocata
> in a same way.
> We're considering to create a patch for taking ram_allocation_ratio in
> to account.
> My question is - is ram_allocation_ratio ignored on purpose when using
> cpu pinning? If yes, what is the reasoning behind it? And what would be
> the right solution to ensure having reserved RAM on the NUMA nodes?

Both 'ram_allocation_ratio' and 'cpu_allocation_ratio' are ignored when using
pinned CPUs because they don't make much sense: you want a high performance VM
and have assigned dedicated cores to the instance for this purpose, yet you're
telling nova to over-schedule and schedule multiple instances to some of these
same cores.

What you're probably looking for is the 'reserved_host_memory_mb' option. This
defaults to 512 (at least in the latest master) so if you up this to 4192 or
similar you should resolve the issue.

Hope this helps,

More information about the OpenStack-dev mailing list