[nova] NUMA scheduling

Sean Mooney smooney at redhat.com
Mon Oct 19 11:53:48 UTC 2020


On Sat, 2020-10-17 at 04:04 +0000, Erik Olof Gunnar Andersson wrote:
> We have been running with NUMA configured for a long time and don't believe I have seen this behavior. It's important that you configure the flavors
> / aggregates correct.
> 
> I think this might be what you are looking for
> 
> penstack flavor set m1.large --property hw:cpu_policy=dedicated
> 
> https://docs.openstack.org/nova/pike/admin/cpu-topologies.html
no this is not what is needed that enabled cpu pinning not numa waware memory allocation.
that implictly create a numa toplogy of 1 numa node if you do not override that default by setting hw:numa_nodes=1
> 
> Pretty sure we also set this for any flavor that only requires a single NUMA zone
> 
> openstack flavor set m1.large --property hw:numa_nodes=1
this is how you specify the numa of guest numa nodes yes but hw:mem_page_size is what is missing to enable numa aware
memory tracking. without hw:mem_page_size or the image version hw_mem_page_size nova will use the gloably free memeory
on the host when determinging if it can boot a vm.

this is a knonw behavor since numa was intoduced as a reuslt of the design choice to make numa affinty opt in.
as a result of coustomer misconfiguring there system lately i had filed https://bugs.launchpad.net/nova/+bug/1893121
to adres some of this behavior but it was determined to be a feature since we discused this when first adding numa and declared
it out of scope.

once change i have broght up in the past and i might raise it at the ptg again is the idea of default ing to hw:mem_page_size=any
if you have a numa vm and dont otherwise set it. that would stop the behaivor being descibed here but it will
mean you cannot do memory oversubstribption with numa guests. i have long held the view that usign numa affintiy and memory oversubsription was
mutally exclusive and we should just default to making this work out of the box for people but that si why we have not made this change the last
3-4 time i have raised it at the ptg/devsummit.

this is a change we will have to make if we track numa in plamcent in anycase.

> ________________________________
> From: Eric K. Miller <emiller at genesishosting.com>
> Sent: Friday, October 16, 2020 8:47 PM
> To: Laurent Dumont <laurentfdumont at gmail.com>
> Cc: openstack-discuss <openstack-discuss at lists.openstack.org>
> Subject: RE: [nova] NUMA scheduling
> 
> > As far as I know, numa_nodes=1 just means --> the resources for that VM should run on one NUMA node (so either NUMA0 or NUMA1). If there is space
> > free on both, then it's probably going to pick one of the two?
> 
> I thought the same, but it appears that VMs are never scheduled on NUMA1 even though NUMA0 is full (causing OOM to trigger and kill running VMs).  I
> would have hoped that a NUMA node was treated like a host, and thus "VMs being balanced across nodes".
> 
> The discussion on NUMA handling is long, so I was hoping that there might be information about the latest solution to the problem - or to be told
> that there isn't a good solution other than using huge pages.
> 
> Eric
> 





More information about the openstack-discuss mailing list