[openstack-dev] [nova] Running large instances with CPU pinning and OOM

Jakub Jursa jakub.jursa at chillisys.com
Wed Sep 27 09:12:01 UTC 2017



On 27.09.2017 10:40, Blair Bethwaite wrote:
> On 27 September 2017 at 18:14, Stephen Finucane <sfinucan at redhat.com> wrote:
>> What you're probably looking for is the 'reserved_host_memory_mb' option. This
>> defaults to 512 (at least in the latest master) so if you up this to 4192 or
>> similar you should resolve the issue.
> 
> I don't see how this would help given the problem description -
> reserved_host_memory_mb would only help avoid causing OOM when
> launching the last guest that would otherwise fit on a host based on
> Nova's simplified notion of memory capacity. It sounds like both CPU
> and NUMA pinning are in play here, otherwise the host would have no
> problem allocating RAM on a different NUMA node and OOM would be
> avoided.

I'm not quite sure if/how OpenStack handles NUMA pinning (why is VM
being killed by OOM rather than having memory allocated on different
NUMA node). Anyway, good point, thank you, I should have a look at exact
parameters passed to QEMU when using CPU pinning.

> 
> Jakub, your numbers sound reasonable to me, i.e., use 60 out of 64GB

Hm, but the question is, how to prevent having some smaller instance
(e.g. 2GB RAM) scheduled on such NUMA node?

> when only considering QEMU overhead - however I would expect that
> might  be a problem on NUMA node0 where there will be extra reserved
> memory regions for kernel and devices. In such a configuration where
> you are wanting to pin multiple guests into each of multiple NUMA
> nodes I think you may end up needing different flavor/instance-type
> configs (using less RAM) for node0 versus other NUMA nodes. Suggest

What do you mean using different flavor? From what I understand (
http://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/virt-driver-numa-placement.html
https://docs.openstack.org/nova/pike/admin/cpu-topologies.html ) it can
be specified that flavor 'wants' different amount memory from its
(virtual) NUMA nodes, but mapping vCPU <-> pCPU is more or less
arbitrary (meaning that there is no way how to specify for NUMA node0 on
physical host that it has less memory available for VM allocation)

> freshly booting one of your hypervisors and then with no guests
> running take a look at e.g. /proc/buddyinfo/ and /proc/zoneinfo to see
> what memory is used/available and where.
> 

Thanks, I'll look into it.


Regards,

Jakub



More information about the OpenStack-dev mailing list