[nova] NUMA scheduling

Sean Mooney smooney at redhat.com
Mon Oct 19 11:45:06 UTC 2020


On Fri, 2020-10-16 at 22:47 -0500, Eric K. Miller wrote:
> > As far as I know, numa_nodes=1 just means --> the resources for that VM should run on one NUMA node (so either NUMA0 or NUMA1). If there is space
> > free on both, then it's probably going to pick one of the two?
> 
> I thought the same, but it appears that VMs are never scheduled on NUMA1 even though NUMA0 is full (causing OOM to trigger and kill running VMs).  I
> would have hoped that a NUMA node was treated like a host, and thus "VMs being balanced across nodes".
hw:numa_nodes=1 does not enable per numa node memory tracking

to resolve your OOM issue you need to set hw:mem_page_size=small or hw:mem_page_size=any
the reason that it is always selection  numa 0 is that nova is taking the list of host numa nodes and checkign each one using itertools.permutations.
that always checks the numa nodes in a stable order starting with numa node 0.

since you have jsut set hw:numa_nodes=1 without requesting any numa specifc resouces e.g. memory or cpus  numa node 0 will effectivly always fit the
vm

when you set hw:numa_nodes=1 and nothing else the scudler will only reject a node if the number of cpus on the numa node is less that the number the
vm requests. it will not check the memory availabel on the numa node since you did not ask nova to do that via hw:mem_page_size

effectivly if you are using any numa feature in nova and do not set hw:mem_page_size then your flavor is misconfigured as it will not request numa
local memory trackign to be enabled.
> 
> The discussion on NUMA handling is long, so I was hoping that there might be information about the latest solution to the problem - or to be told
> that there isn't a good solution other than using huge pages.
you do not need to use hugepages but you do need to enable per numa node memory tracking with hw:mem_page_size=small (use non hugepage typicaly 4k
pages) or hw:mem_page_size=any  which basicaly is the same as small but the image can requst hugepages if it wantws too. if you  set small in the
flaor but large in the image that is an error. if you set any in the falvor the image can set any value it like like small or large or an explcit page
size and the schduler will honour that.

if you know you wnat the flavor to use small pages then you shoudl just set small explictly.
> 
> Eric
> 





More information about the openstack-discuss mailing list