[openstack-dev] [nova] Running large instances with CPU pinning and OOM
jakub.jursa at chillisys.com
Wed Sep 27 08:45:16 UTC 2017
On 27.09.2017 10:14, Stephen Finucane wrote:
> On Mon, 2017-09-25 at 17:36 +0200, Jakub Jursa wrote:
>> Hello everyone,
>> We're experiencing issues with running large instances (~60GB RAM) on
>> fairly large NUMA nodes (4 CPUs, 256GB RAM) while using cpu pinning. The
>> problem is that it seems that in some extreme cases qemu/KVM can have
>> significant memory overhead (10-15%?) which nova-compute service doesn't
>> take in to the account when launching VMs. Using our configuration as an
>> example - imagine running two VMs with 30GB RAM on one NUMA node
>> (because we use cpu pinning) - therefore using 60GB out of 64GB for
>> given NUMA domain. When both VMs would consume their entire memory
>> (given 10% KVM overhead) OOM killer takes an action (despite having
>> plenty of free RAM in other NUMA nodes). (the numbers are just
>> arbitrary, the point is that nova-scheduler schedules the instance to
>> run on the node because the memory seems 'free enough', but specific
>> NUMA node can be lacking the memory reserve).
>> Our initial solution was to use ram_allocation_ratio < 1 to ensure
>> having some reserved memory - this didn't work. Upon studying source of
>> nova, it turns out that ram_allocation_ratio is ignored when using cpu
>> pinning. (see
>> ). We're running Mitaka, but this piece of code is implemented in Ocata
>> in a same way.
>> We're considering to create a patch for taking ram_allocation_ratio in
>> to account.
>> My question is - is ram_allocation_ratio ignored on purpose when using
>> cpu pinning? If yes, what is the reasoning behind it? And what would be
>> the right solution to ensure having reserved RAM on the NUMA nodes?
> Both 'ram_allocation_ratio' and 'cpu_allocation_ratio' are ignored when using
> pinned CPUs because they don't make much sense: you want a high performance VM
> and have assigned dedicated cores to the instance for this purpose, yet you're
> telling nova to over-schedule and schedule multiple instances to some of these
> same cores.
I wanted to use 'ram_allocation_ration' with value for example 0.8 to
force 'under-schedule' the host, to create a reserve on the host.
> What you're probably looking for is the 'reserved_host_memory_mb' option. This
> defaults to 512 (at least in the latest master) so if you up this to 4192 or
> similar you should resolve the issue.
I'm afraid that this won't help as this option doesn't take into account
NUMA nodes (e.g. there would be 'reserved_host_memory_mb' amount of free
memory on the physical node, but not in all its NUMA nodes
> Hope this helps,
More information about the OpenStack-dev