[openstack-dev] [nova] Running large instances with CPU pinning and OOM

Balazs Gibizer balazs.gibizer at ericsson.com
Wed Sep 27 11:10:16 UTC 2017



On Wed, Sep 27, 2017 at 11:58 AM, Jakub Jursa 
<jakub.jursa at chillisys.com> wrote:
> 
> 
> On 27.09.2017 11:12, Jakub Jursa wrote:
>> 
>> 
>>  On 27.09.2017 10:40, Blair Bethwaite wrote:
>>>  On 27 September 2017 at 18:14, Stephen Finucane 
>>> <sfinucan at redhat.com> wrote:
>>>>  What you're probably looking for is the 'reserved_host_memory_mb' 
>>>> option. This
>>>>  defaults to 512 (at least in the latest master) so if you up this 
>>>> to 4192 or
>>>>  similar you should resolve the issue.
>>> 
>>>  I don't see how this would help given the problem description -
>>>  reserved_host_memory_mb would only help avoid causing OOM when
>>>  launching the last guest that would otherwise fit on a host based 
>>> on
>>>  Nova's simplified notion of memory capacity. It sounds like both 
>>> CPU
>>>  and NUMA pinning are in play here, otherwise the host would have no
>>>  problem allocating RAM on a different NUMA node and OOM would be
>>>  avoided.
>> 
>>  I'm not quite sure if/how OpenStack handles NUMA pinning (why is VM
>>  being killed by OOM rather than having memory allocated on different
>>  NUMA node). Anyway, good point, thank you, I should have a look at 
>> exact
>>  parameters passed to QEMU when using CPU pinning.
>> 
>>> 
>>>  Jakub, your numbers sound reasonable to me, i.e., use 60 out of 
>>> 64GB
>> 
>>  Hm, but the question is, how to prevent having some smaller instance
>>  (e.g. 2GB RAM) scheduled on such NUMA node?
>> 
>>>  when only considering QEMU overhead - however I would expect that
>>>  might  be a problem on NUMA node0 where there will be extra 
>>> reserved
>>>  memory regions for kernel and devices. In such a configuration 
>>> where
>>>  you are wanting to pin multiple guests into each of multiple NUMA
>>>  nodes I think you may end up needing different flavor/instance-type
>>>  configs (using less RAM) for node0 versus other NUMA nodes. Suggest
>> 
>>  What do you mean using different flavor? From what I understand (
>>  
>> http://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/virt-driver-numa-placement.html
>>  https://docs.openstack.org/nova/pike/admin/cpu-topologies.html ) it 
>> can
>>  be specified that flavor 'wants' different amount memory from its
>>  (virtual) NUMA nodes, but mapping vCPU <-> pCPU is more or less
>>  arbitrary (meaning that there is no way how to specify for NUMA 
>> node0 on
>>  physical host that it has less memory available for VM allocation)
> 
> Can't be 'reserved_huge_pages' option used to reserve memory on 
> certain
> NUMA nodes?
> https://docs.openstack.org/ocata/config-reference/compute/config-options.html

I think the qemu memory overhead is allocated from the 4k memory pool 
so the question is if it is possible to reserve 4k pages with the 
reserved_huge_pages config option. I don't find any restriction in the 
code base about 4k pages (even if it is not considered as a large page 
by definition) so in theory you can do it. However this also means you 
have to enable NumaTopologyFilter.

Cheers,
gibi

> 
> 
> 
>> 
>>>  freshly booting one of your hypervisors and then with no guests
>>>  running take a look at e.g. /proc/buddyinfo/ and /proc/zoneinfo to 
>>> see
>>>  what memory is used/available and where.
>>> 
>> 
>>  Thanks, I'll look into it.
>> 
>> 
>>  Regards,
>> 
>>  Jakub
>> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: 
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list