<div dir="ltr">I'll try to reproduce and collect logs for a bug report.<div><br></div><div>Thanks for the info.</div><div><br></div><div>PCM</div><div><br><br><div class="gmail_quote"><div dir="ltr">On Thu, Jun 9, 2016 at 9:43 AM Matt Riedemann <<a href="mailto:mriedem@linux.vnet.ibm.com">mriedem@linux.vnet.ibm.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 6/9/2016 6:15 AM, Paul Michali wrote:<br>

><br>

><br>

> On Wed, Jun 8, 2016 at 11:21 PM Chris Friesen<br>

> <<a href="mailto:chris.friesen@windriver.com" target="_blank">chris.friesen@windriver.com</a> <mailto:<a href="mailto:chris.friesen@windriver.com" target="_blank">chris.friesen@windriver.com</a>>> wrote:<br>

><br>

>     On 06/03/2016 12:03 PM, Paul Michali wrote:<br>

>     > Thanks for the link Tim!<br>

>     ><br>

>     > Right now, I have two things I'm unsure about...<br>

>     ><br>

>     > One is that I had 1945 huge pages left (of size 2048k) and tried<br>

>     to create a VM<br>

>     > with a small flavor (2GB), which should need 1024 pages, but Nova<br>

>     indicated that<br>

>     > it wasn't able to find a host (and QEMU reported an allocation issue).<br>

>     ><br>

>     > The other is that VMs are not being evenly distributed on my two<br>

>     NUMA nodes, and<br>

>     > instead, are getting created all on one NUMA node. Not sure if<br>

>     that is expected<br>

>     > (and setting mem_page_size to 2048 is the proper way).<br>

><br>

><br>

>     Just in case you haven't figured out the problem...<br>

><br>

>     Have you checked the per-host-numa-node 2MB huge page availability<br>

>     on your host?<br>

>       If it's uneven then that might explain what you're seeing.<br>

><br>

><br>

> These are the observations/questions I have:<br>

><br>

> 1) On the host, I was seeing 32768 huge pages, of 2MB size. When I<br>

> created VMs (Cirros) using small flavor, each VM was getting created on<br>

> NUMA nodeid 0. When it hit half of the available pages, I could no<br>

> longer create any VMs (QEMU saying no space). I'd like to understand why<br>

> the assignment was always going two nodeid 0, and to confirm that the<br>

> huge pages are divided among the number of NUMA nodes available.<br>

><br>

> 2) I changed mem_page_size from 1024 to 2048 in the flavor, and then<br>

> when VMs were created, they were being evenly assigned to the two NUMA<br>

> nodes. Each using 1024 huge pages. At this point I could create more<br>

> than half, but when there were 1945 pages left, it failed to create a<br>

> VM. Did it fail because the mem_page_size was 2048 and the available<br>

> pages were 1945, even though we were only requesting 1024 pages?<br>

><br>

> 3) Related to #2, is there a relationship between mem_page_size, the<br>

> allocation of VMs to NUMA nodes, and the flavor size? IOW, if I use the<br>

> medium flavor (4GB), will I need a larger mem_page_size? (I'll play with<br>

> this variation, as soon as I can). Gets back to understanding how the<br>

> scheduling determines how to assign the VMs.<br>

><br>

> 4) When the VM create failed due to QEMU failing allocation, the VM went<br>

> to error state. I deleted the VM, but the neutron port was still there,<br>

> and there were no log messages indicating that a request was made to<br>

> delete the port. Is this expected (that the user would have to manually<br>

> clean up the port)?<br>

<br>

When you hit this case, can you check if instance.host is set in the<br>

database before deleting the instance? I'm guessing what's happening is<br>

the instance didn't get assigned a host since it eventually ended up<br>

with NoValidHost, so when you go to delete it doesn't have a compute to<br>

send it to for delete, so it deletes from the compute API, and we don't<br>

have the host binding details to delete the port.<br>

<br>

Although, when the spawn failed in the compute to begin with we should<br>

have deallocated any networking that was created before kicking back to<br>

the scheduler - unless we don't go back to the scheduler if the instance<br>

is set to ERROR state.<br>

<br>

A bug report with stacktrace of the failure scenario when the instance<br>

goes to error state bug n-cpu logs would probably help.<br>

<br>

><br>

> 5) A coworker had hit the problem mentioned in #1, with exhaustion at<br>

> the halfway point. If she delete's a VM, and then changes the flavor to<br>

> change the mem_page_size to 2048, should Nova start assigning all new<br>

> VMs to the other NUMA node, until the pool of huge pages is down to<br>

> where the huge pages are for NUMA node 0, or will it alternate between<br>

> the available NUMA nodes (and run out when node 0's pool is exhausted)?<br>

><br>

> Thanks in advance!<br>

><br>

> PCM<br>

><br>

><br>

><br>

><br>

>     Chris<br>

><br>

>     __________________________________________________________________________<br>

>     OpenStack Development Mailing List (not for usage questions)<br>

>     Unsubscribe:<br>

>     <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

>     <<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>

>     <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

><br>

><br>

><br>

> __________________________________________________________________________<br>

> OpenStack Development Mailing List (not for usage questions)<br>

> Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

><br>

<br>

<br>

--<br>

<br>

Thanks,<br>

<br>

Matt Riedemann<br>

<br>

<br>

__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</blockquote></div></div></div>