<div dir="ltr">See PCM: Inline...<div><br><br><div class="gmail_quote"><div dir="ltr">On Thu, Jun 9, 2016 at 11:42 AM Steve Gordon <<a href="mailto:sgordon@redhat.com">sgordon@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">----- Original Message -----<br>

> From: "Paul Michali" <<a href="mailto:pc@michali.net" target="_blank">pc@michali.net</a>><br>

> To: "OpenStack Development Mailing List (not for usage questions)" <<a href="mailto:openstack-dev@lists.openstack.org" target="_blank">openstack-dev@lists.openstack.org</a>><br>

> Sent: Tuesday, June 7, 2016 11:00:30 AM<br>

> Subject: Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling<br>

><br>

> Anyone have any thoughts on the two questions below? Namely...<br>

><br>

> If the huge pages are 2M, we are creating a 2GB VM, have 1945 huge pages,<br>

> should the allocation fail (and if so why)?<br>

<br>

Were enough pages (1024) available in a single NUMA node? Which release are you using? There was a bug where node 0 would always be picked (and eventually exhausted) but that was - theoretically - fixed under <a href="https://bugs.launchpad.net/nova/+bug/1386236" rel="noreferrer" target="_blank">https://bugs.launchpad.net/nova/+bug/1386236</a></blockquote><div><br></div><div>PCM: This is on LIberty, so it sounds like the bugfix was in there.  It's possible that there was not 1024 left, on a single NUMA node.</div><div><br></div><div>Regards,</div><div><br></div><div>PCM</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

<br>

> Why do all the 2GB VMs get created on the same NUMA node, instead of<br>

> getting evenly assigned to each of the two NUMA nodes that are available on<br>

> the compute node (as a result, allocation fails, when 1/2 the huge pages<br>

> are used)? I found that increasing mem_page_size to 2048 resolves the<br>

> issue, but don't know why.<br>

<br>

What was the mem_page_size before it was 2048? I didn't think any smaller value was supported.<br>

<br>

> ANother thing I was seeing, when the VM create failed due to not enough<br>

> huge pages available and was in error state, I could delete the VM, but the<br>

> Neutron port was still there.  Is that correct?<br>

><br>

> I didn't see any log messages in neutron, requesting to unbind and delete<br>

> the port.<br>

><br>

> Thanks!<br>

><br>

> PCM<br>

><br>

> .<br>

><br>

> On Fri, Jun 3, 2016 at 2:03 PM Paul Michali <<a href="mailto:pc@michali.net" target="_blank">pc@michali.net</a>> wrote:<br>

><br>

> > Thanks for the link Tim!<br>

> ><br>

> > Right now, I have two things I'm unsure about...<br>

> ><br>

> > One is that I had 1945 huge pages left (of size 2048k) and tried to create<br>

> > a VM with a small flavor (2GB), which should need 1024 pages, but Nova<br>

> > indicated that it wasn't able to find a host (and QEMU reported an<br>

> > allocation issue).<br>

> ><br>

> > The other is that VMs are not being evenly distributed on my two NUMA<br>

> > nodes, and instead, are getting created all on one NUMA node. Not sure if<br>

> > that is expected (and setting mem_page_size to 2048 is the proper way).<br>

> ><br>

> > Regards,<br>

> ><br>

> > PCM<br>

> ><br>

> ><br>

> > On Fri, Jun 3, 2016 at 1:21 PM Tim Bell <<a href="mailto:Tim.Bell@cern.ch" target="_blank">Tim.Bell@cern.ch</a>> wrote:<br>

> ><br>

> >> The documentation at<br>

> >> <a href="http://docs.openstack.org/admin-guide/compute-flavors.html" rel="noreferrer" target="_blank">http://docs.openstack.org/admin-guide/compute-flavors.html</a> is gradually<br>

> >> improving. Are there areas which were not covered in your clarifications ?<br>

> >> If so, we should fix the documentation too since this is a complex area to<br>

> >> configure and good documentation is a great help.<br>

> >><br>

> >><br>

> >><br>

> >> BTW, there is also an issue around how the RAM for the BIOS is shadowed.<br>

> >> I can’t find the page from a quick google but we found an imbalance when<br>

> >> we<br>

> >> used 2GB pages as the RAM for BIOS shadowing was done by default in the<br>

> >> memory space for only one of the NUMA spaces.<br>

> >><br>

> >><br>

> >><br>

> >> Having a look at the KVM XML can also help a bit if you are debugging.<br>

> >><br>

> >><br>

> >><br>

> >> Tim<br>

> >><br>

> >><br>

> >><br>

> >> *From: *Paul Michali <<a href="mailto:pc@michali.net" target="_blank">pc@michali.net</a>><br>

> >> *Reply-To: *"OpenStack Development Mailing List (not for usage<br>

> >> questions)" <<a href="mailto:openstack-dev@lists.openstack.org" target="_blank">openstack-dev@lists.openstack.org</a>><br>

> >> *Date: *Friday 3 June 2016 at 15:18<br>

> >> *To: *"Daniel P. Berrange" <<a href="mailto:berrange@redhat.com" target="_blank">berrange@redhat.com</a>>, "OpenStack Development<br>

> >> Mailing List (not for usage questions)" <<br>

> >> <a href="mailto:openstack-dev@lists.openstack.org" target="_blank">openstack-dev@lists.openstack.org</a>><br>

> >> *Subject: *Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling<br>

> >><br>

> >><br>

> >><br>

> >> See PCM inline...<br>

> >><br>

> >> On Fri, Jun 3, 2016 at 8:44 AM Daniel P. Berrange <<a href="mailto:berrange@redhat.com" target="_blank">berrange@redhat.com</a>><br>

> >> wrote:<br>

> >><br>

> >> On Fri, Jun 03, 2016 at 12:32:17PM +0000, Paul Michali wrote:<br>

> >> > Hi!<br>

> >> ><br>

> >> > I've been playing with Liberty code a bit and had some questions that<br>

> >> I'm<br>

> >> > hoping Nova folks may be able to provide guidance on...<br>

> >> ><br>

> >> > If I set up a flavor with hw:mem_page_size=2048, and I'm creating<br>

> >> (Cirros)<br>

> >> > VMs with size 1024, will the scheduling use the minimum of the number of<br>

> >><br>

> >> 1024 what units ? 1024 MB, or 1024 huge pages aka 2048 MB ?<br>

> >><br>

> >><br>

> >><br>

> >> PCM: I was using small flavor, which is 2 GB. So that's 2048 MB and the<br>

> >> page size is 2048K, so 1024 pages? Hope I have the units right.<br>

> >><br>

> >><br>

> >><br>

> >><br>

> >><br>

> >><br>

> >> > huge pages available and the size requested for the VM, or will it base<br>

> >> > scheduling only on the number of huge pages?<br>

> >> ><br>

> >> > It seems to be doing the latter, where I had 1945 huge pages free, and<br>

> >> > tried to create another VM (1024) and Nova rejected the request with "no<br>

> >> > hosts available".<br>

> >><br>

> >> From this I'm guessing you're meaning 1024 huge pages aka 2 GB earlier.<br>

> >><br>

> >> Anyway, when you request huge pages to be used for a flavour, the<br>

> >> entire guest RAM must be able to be allocated from huge pages.<br>

> >> ie if you have a guest with 2 GB of RAM, you must have 2 GB worth<br>

> >> of huge pages available. It is not possible for a VM to use<br>

> >> 1.5 GB of huge pages and 500 MB of normal sized pages.<br>

> >><br>

> >><br>

> >><br>

> >> PCM: Right, so, with 2GB of RAM, I need 1024 huge pages of size 2048K. In<br>

> >> this case, there are 1945 huge pages available, so I was wondering why it<br>

> >> failed. Maybe I'm confusing sizes/pages?<br>

> >><br>

> >><br>

> >><br>

> >><br>

> >><br>

> >><br>

> >> > Is this still the same for Mitaka?<br>

> >><br>

> >> Yep, this use of huge pages has not changed.<br>

> >><br>

> >> > Where could I look in the code to see how the scheduling is determined?<br>

> >><br>

> >> Most logic related to huge pages is in nova/virt/hardware.py<br>

> >><br>

> >> > If I use mem_page_size=large (what I originally had), should it evenly<br>

> >> > assign huge pages from the available NUMA nodes (there are two in my<br>

> >> case)?<br>

> >> ><br>

> >> > It looks like it was assigning all VMs to the same NUMA node (0) in this<br>

> >> > case. Is the right way to change to 2048, like I did above?<br>

> >><br>

> >> Nova will always avoid spreading your VM across 2 host NUMA nodes,<br>

> >> since that gives bad performance characteristics. IOW, it will always<br>

> >> allocate huge pages from the NUMA node that the guest will run on. If<br>

> >> you explicitly want your VM to spread across 2 host NUMA nodes, then<br>

> >> you must tell nova to create 2 *guest* NUMA nodes for the VM. Nova<br>

> >> will then place each guest NUMA node, on a separate host NUMA node<br>

> >> and allocate huge pages from node to match. This is done using<br>

> >> the hw:numa_nodes=2 parameter on the flavour<br>

> >><br>

> >><br>

> >><br>

> >> PCM: Gotcha, but that was not the issue I'm seeing. With this small<br>

> >> flavor (2GB = 1024 pages), I had 13107 huge pages initially. As I created<br>

> >> VMs, they were *all* placed on the same NUMA node (0). As a result, when I<br>

> >> got to more than have the available pages, Nova failed to allow further<br>

> >> VMs, even though I had 6963 available on one compute node, and 5939 on<br>

> >> another.<br>

> >><br>

> >><br>

> >><br>

> >> It seems that all the assignments were to node zero. Someone suggested to<br>

> >> me to set mem_page_size to 2048, and at that point it started assigning to<br>

> >> both NUMA nodes evenly.<br>

> >><br>

> >><br>

> >><br>

> >> Thanks for the help!!!<br>

> >><br>

> >><br>

> >><br>

> >><br>

> >><br>

> >> Regards,<br>

> >><br>

> >><br>

> >><br>

> >> PCM<br>

> >><br>

> >><br>

> >><br>

> >><br>

> >> > Again, has this changed at all in Mitaka?<br>

> >><br>

> >> Nope. Well aside from random bug fixes.<br>

> >><br>

> >> Regards,<br>

> >> Daniel<br>

> >> --<br>

> >> |: <a href="http://berrange.com" rel="noreferrer" target="_blank">http://berrange.com</a>      -o-<br>

> >> <a href="http://www.flickr.com/photos/dberrange/" rel="noreferrer" target="_blank">http://www.flickr.com/photos/dberrange/</a> :|<br>

> >> |: <a href="http://libvirt.org" rel="noreferrer" target="_blank">http://libvirt.org</a>              -o-<br>

> >> <a href="http://virt-manager.org" rel="noreferrer" target="_blank">http://virt-manager.org</a> :|<br>

> >> |: <a href="http://autobuild.org" rel="noreferrer" target="_blank">http://autobuild.org</a>       -o-<br>

> >> <a href="http://search.cpan.org/~danberr/" rel="noreferrer" target="_blank">http://search.cpan.org/~danberr/</a> :|<br>

> >> |: <a href="http://entangle-photo.org" rel="noreferrer" target="_blank">http://entangle-photo.org</a>       -o-<br>

> >> <a href="http://live.gnome.org/gtk-vnc" rel="noreferrer" target="_blank">http://live.gnome.org/gtk-vnc</a> :|<br>

> >><br>

> >> __________________________________________________________________________<br>

> >> OpenStack Development Mailing List (not for usage questions)<br>

> >> Unsubscribe:<br>

> >> <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

> >> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

> >><br>

> >> __________________________________________________________________________<br>

> >> OpenStack Development Mailing List (not for usage questions)<br>

> >> Unsubscribe:<br>

> >> <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

> >> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

> >><br>

> ><br>

><br>

> __________________________________________________________________________<br>

> OpenStack Development Mailing List (not for usage questions)<br>

> Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

><br>

<br>

--<br>

Steve Gordon,<br>

Principal Product Manager,<br>

Red Hat OpenStack Platform<br>

<br>

__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</blockquote></div></div></div>