[openstack-dev] [nova] NUMA, huge pages, and scheduling

Paul Michali pc at michali.net
Thu Jun 9 11:15:26 UTC 2016

On Wed, Jun 8, 2016 at 11:21 PM Chris Friesen <chris.friesen at windriver.com>

> On 06/03/2016 12:03 PM, Paul Michali wrote:
> > Thanks for the link Tim!
> >
> > Right now, I have two things I'm unsure about...
> >
> > One is that I had 1945 huge pages left (of size 2048k) and tried to
> create a VM
> > with a small flavor (2GB), which should need 1024 pages, but Nova
> indicated that
> > it wasn't able to find a host (and QEMU reported an allocation issue).
> >
> > The other is that VMs are not being evenly distributed on my two NUMA
> nodes, and
> > instead, are getting created all on one NUMA node. Not sure if that is
> expected
> > (and setting mem_page_size to 2048 is the proper way).
> Just in case you haven't figured out the problem...
> Have you checked the per-host-numa-node 2MB huge page availability on your
> host?
>   If it's uneven then that might explain what you're seeing.

These are the observations/questions I have:

1) On the host, I was seeing 32768 huge pages, of 2MB size. When I created
VMs (Cirros) using small flavor, each VM was getting created on NUMA nodeid
0. When it hit half of the available pages, I could no longer create any
VMs (QEMU saying no space). I'd like to understand why the assignment was
always going two nodeid 0, and to confirm that the huge pages are divided
among the number of NUMA nodes available.

2) I changed mem_page_size from 1024 to 2048 in the flavor, and then when
VMs were created, they were being evenly assigned to the two NUMA nodes.
Each using 1024 huge pages. At this point I could create more than half,
but when there were 1945 pages left, it failed to create a VM. Did it fail
because the mem_page_size was 2048 and the available pages were 1945, even
though we were only requesting 1024 pages?

3) Related to #2, is there a relationship between mem_page_size, the
allocation of VMs to NUMA nodes, and the flavor size? IOW, if I use the
medium flavor (4GB), will I need a larger mem_page_size? (I'll play with
this variation, as soon as I can). Gets back to understanding how the
scheduling determines how to assign the VMs.

4) When the VM create failed due to QEMU failing allocation, the VM went to
error state. I deleted the VM, but the neutron port was still there, and
there were no log messages indicating that a request was made to delete the
port. Is this expected (that the user would have to manually clean up the

5) A coworker had hit the problem mentioned in #1, with exhaustion at the
halfway point. If she delete's a VM, and then changes the flavor to change
the mem_page_size to 2048, should Nova start assigning all new VMs to the
other NUMA node, until the pool of huge pages is down to where the huge
pages are for NUMA node 0, or will it alternate between the available NUMA
nodes (and run out when node 0's pool is exhausted)?

Thanks in advance!


> Chris
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160609/7b9af614/attachment.html>

More information about the OpenStack-dev mailing list