[openstack-dev] [nova] NUMA, huge pages, and scheduling

Matt Riedemann mriedem at linux.vnet.ibm.com
Thu Jun 9 13:41:46 UTC 2016


On 6/9/2016 6:15 AM, Paul Michali wrote:
>
>
> On Wed, Jun 8, 2016 at 11:21 PM Chris Friesen
> <chris.friesen at windriver.com <mailto:chris.friesen at windriver.com>> wrote:
>
>     On 06/03/2016 12:03 PM, Paul Michali wrote:
>     > Thanks for the link Tim!
>     >
>     > Right now, I have two things I'm unsure about...
>     >
>     > One is that I had 1945 huge pages left (of size 2048k) and tried
>     to create a VM
>     > with a small flavor (2GB), which should need 1024 pages, but Nova
>     indicated that
>     > it wasn't able to find a host (and QEMU reported an allocation issue).
>     >
>     > The other is that VMs are not being evenly distributed on my two
>     NUMA nodes, and
>     > instead, are getting created all on one NUMA node. Not sure if
>     that is expected
>     > (and setting mem_page_size to 2048 is the proper way).
>
>
>     Just in case you haven't figured out the problem...
>
>     Have you checked the per-host-numa-node 2MB huge page availability
>     on your host?
>       If it's uneven then that might explain what you're seeing.
>
>
> These are the observations/questions I have:
>
> 1) On the host, I was seeing 32768 huge pages, of 2MB size. When I
> created VMs (Cirros) using small flavor, each VM was getting created on
> NUMA nodeid 0. When it hit half of the available pages, I could no
> longer create any VMs (QEMU saying no space). I'd like to understand why
> the assignment was always going two nodeid 0, and to confirm that the
> huge pages are divided among the number of NUMA nodes available.
>
> 2) I changed mem_page_size from 1024 to 2048 in the flavor, and then
> when VMs were created, they were being evenly assigned to the two NUMA
> nodes. Each using 1024 huge pages. At this point I could create more
> than half, but when there were 1945 pages left, it failed to create a
> VM. Did it fail because the mem_page_size was 2048 and the available
> pages were 1945, even though we were only requesting 1024 pages?
>
> 3) Related to #2, is there a relationship between mem_page_size, the
> allocation of VMs to NUMA nodes, and the flavor size? IOW, if I use the
> medium flavor (4GB), will I need a larger mem_page_size? (I'll play with
> this variation, as soon as I can). Gets back to understanding how the
> scheduling determines how to assign the VMs.
>
> 4) When the VM create failed due to QEMU failing allocation, the VM went
> to error state. I deleted the VM, but the neutron port was still there,
> and there were no log messages indicating that a request was made to
> delete the port. Is this expected (that the user would have to manually
> clean up the port)?

When you hit this case, can you check if instance.host is set in the 
database before deleting the instance? I'm guessing what's happening is 
the instance didn't get assigned a host since it eventually ended up 
with NoValidHost, so when you go to delete it doesn't have a compute to 
send it to for delete, so it deletes from the compute API, and we don't 
have the host binding details to delete the port.

Although, when the spawn failed in the compute to begin with we should 
have deallocated any networking that was created before kicking back to 
the scheduler - unless we don't go back to the scheduler if the instance 
is set to ERROR state.

A bug report with stacktrace of the failure scenario when the instance 
goes to error state bug n-cpu logs would probably help.

>
> 5) A coworker had hit the problem mentioned in #1, with exhaustion at
> the halfway point. If she delete's a VM, and then changes the flavor to
> change the mem_page_size to 2048, should Nova start assigning all new
> VMs to the other NUMA node, until the pool of huge pages is down to
> where the huge pages are for NUMA node 0, or will it alternate between
> the available NUMA nodes (and run out when node 0's pool is exhausted)?
>
> Thanks in advance!
>
> PCM
>
>
>
>
>     Chris
>
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


-- 

Thanks,

Matt Riedemann




More information about the OpenStack-dev mailing list