[openstack-dev] [nova] Running large instances with CPU pinning and OOM

Jakub Jursa jakub.jursa at chillisys.com
Wed Sep 27 13:19:36 UTC 2017



On 27.09.2017 14:46, Sahid Orentino Ferdjaoui wrote:
> On Mon, Sep 25, 2017 at 05:36:44PM +0200, Jakub Jursa wrote:
>> Hello everyone,
>>
>> We're experiencing issues with running large instances (~60GB RAM) on
>> fairly large NUMA nodes (4 CPUs, 256GB RAM) while using cpu pinning. The
>> problem is that it seems that in some extreme cases qemu/KVM can have
>> significant memory overhead (10-15%?) which nova-compute service doesn't
>> take in to the account when launching VMs. Using our configuration as an
>> example - imagine running two VMs with 30GB RAM on one NUMA node
>> (because we use cpu pinning) - therefore using 60GB out of 64GB for
>> given NUMA domain. When both VMs would consume their entire memory
>> (given 10% KVM overhead) OOM killer takes an action (despite having
>> plenty of free RAM in other NUMA nodes). (the numbers are just
>> arbitrary, the point is that nova-scheduler schedules the instance to
>> run on the node because the memory seems 'free enough', but specific
>> NUMA node can be lacking the memory reserve).
> 
> In Nova when using NUMA we do pin the memory on the host NUMA nodes
> selected during scheduling. In your case it seems that you have
> specificly requested a guest with 1 NUMA node. It will be not possible
> for the process to grab memory on an other host NUMA node but some
> other processes could be running in that host NUMA node and consume
> memory.

Yes, that is very likely the case - that some other processes consume
the memory on given NUMA node. It seems that setting flavor metadata
'hw:cpu_policy=dedicated' (while NOT setting 'hw:numa_nodes') results in
libvirt pinning CPU in 'strict' memory mode

(from libvirt xml for given instance)
...
  <numatune>
    <memory mode='strict' nodeset='1'/>
    <memnode cellid='0' mode='strict' nodeset='1'/>
  </numatune>
...

So yeah, the instance is not able to allocate memory from another NUMA node.

> 
> What you need is to use Huge Pages, in such case the memory will be
> locked for the guest.

I'm not quite sure what do you mean by 'memory will be locked for the
guest'. Also, aren't huge pages enabled in kernel by default?

> 
>> Our initial solution was to use ram_allocation_ratio < 1 to ensure
>> having some reserved memory - this didn't work. Upon studying source of
>> nova, it turns out that ram_allocation_ratio is ignored when using cpu
>> pinning. (see
>> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L859
>> and
>> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L821
>> ). We're running Mitaka, but this piece of code is implemented in Ocata
>> in a same way.
>> We're considering to create a patch for taking ram_allocation_ratio in
>> to account.
>>
>> My question is - is ram_allocation_ratio ignored on purpose when using
>> cpu pinning? If yes, what is the reasoning behind it? And what would be
>> the right solution to ensure having reserved RAM on the NUMA nodes?
>>
>> Thanks.
>>
>> Regards,
>>
>> Jakub Jursa
>>
> 
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list