[openstack-dev] [nova] Running large instances with CPU pinning and OOM
Balazs Gibizer
balazs.gibizer at ericsson.com
Wed Sep 27 11:10:16 UTC 2017
On Wed, Sep 27, 2017 at 11:58 AM, Jakub Jursa
<jakub.jursa at chillisys.com> wrote:
>
>
> On 27.09.2017 11:12, Jakub Jursa wrote:
>>
>>
>> On 27.09.2017 10:40, Blair Bethwaite wrote:
>>> On 27 September 2017 at 18:14, Stephen Finucane
>>> <sfinucan at redhat.com> wrote:
>>>> What you're probably looking for is the 'reserved_host_memory_mb'
>>>> option. This
>>>> defaults to 512 (at least in the latest master) so if you up this
>>>> to 4192 or
>>>> similar you should resolve the issue.
>>>
>>> I don't see how this would help given the problem description -
>>> reserved_host_memory_mb would only help avoid causing OOM when
>>> launching the last guest that would otherwise fit on a host based
>>> on
>>> Nova's simplified notion of memory capacity. It sounds like both
>>> CPU
>>> and NUMA pinning are in play here, otherwise the host would have no
>>> problem allocating RAM on a different NUMA node and OOM would be
>>> avoided.
>>
>> I'm not quite sure if/how OpenStack handles NUMA pinning (why is VM
>> being killed by OOM rather than having memory allocated on different
>> NUMA node). Anyway, good point, thank you, I should have a look at
>> exact
>> parameters passed to QEMU when using CPU pinning.
>>
>>>
>>> Jakub, your numbers sound reasonable to me, i.e., use 60 out of
>>> 64GB
>>
>> Hm, but the question is, how to prevent having some smaller instance
>> (e.g. 2GB RAM) scheduled on such NUMA node?
>>
>>> when only considering QEMU overhead - however I would expect that
>>> might be a problem on NUMA node0 where there will be extra
>>> reserved
>>> memory regions for kernel and devices. In such a configuration
>>> where
>>> you are wanting to pin multiple guests into each of multiple NUMA
>>> nodes I think you may end up needing different flavor/instance-type
>>> configs (using less RAM) for node0 versus other NUMA nodes. Suggest
>>
>> What do you mean using different flavor? From what I understand (
>>
>> http://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/virt-driver-numa-placement.html
>> https://docs.openstack.org/nova/pike/admin/cpu-topologies.html ) it
>> can
>> be specified that flavor 'wants' different amount memory from its
>> (virtual) NUMA nodes, but mapping vCPU <-> pCPU is more or less
>> arbitrary (meaning that there is no way how to specify for NUMA
>> node0 on
>> physical host that it has less memory available for VM allocation)
>
> Can't be 'reserved_huge_pages' option used to reserve memory on
> certain
> NUMA nodes?
> https://docs.openstack.org/ocata/config-reference/compute/config-options.html
I think the qemu memory overhead is allocated from the 4k memory pool
so the question is if it is possible to reserve 4k pages with the
reserved_huge_pages config option. I don't find any restriction in the
code base about 4k pages (even if it is not considered as a large page
by definition) so in theory you can do it. However this also means you
have to enable NumaTopologyFilter.
Cheers,
gibi
>
>
>
>>
>>> freshly booting one of your hypervisors and then with no guests
>>> running take a look at e.g. /proc/buddyinfo/ and /proc/zoneinfo to
>>> see
>>> what memory is used/available and where.
>>>
>>
>> Thanks, I'll look into it.
>>
>>
>> Regards,
>>
>> Jakub
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list