[openstack-dev] [nova] NUMA, huge pages, and scheduling

Steve Gordon sgordon at redhat.com
Thu Jun 9 15:40:07 UTC 2016


----- Original Message -----
> From: "Paul Michali" <pc at michali.net>
> To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
> Sent: Tuesday, June 7, 2016 11:00:30 AM
> Subject: Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling
> 
> Anyone have any thoughts on the two questions below? Namely...
> 
> If the huge pages are 2M, we are creating a 2GB VM, have 1945 huge pages,
> should the allocation fail (and if so why)?

Were enough pages (1024) available in a single NUMA node? Which release are you using? There was a bug where node 0 would always be picked (and eventually exhausted) but that was - theoretically - fixed under https://bugs.launchpad.net/nova/+bug/1386236

> Why do all the 2GB VMs get created on the same NUMA node, instead of
> getting evenly assigned to each of the two NUMA nodes that are available on
> the compute node (as a result, allocation fails, when 1/2 the huge pages
> are used)? I found that increasing mem_page_size to 2048 resolves the
> issue, but don't know why.

What was the mem_page_size before it was 2048? I didn't think any smaller value was supported.

> ANother thing I was seeing, when the VM create failed due to not enough
> huge pages available and was in error state, I could delete the VM, but the
> Neutron port was still there.  Is that correct?
> 
> I didn't see any log messages in neutron, requesting to unbind and delete
> the port.
> 
> Thanks!
> 
> PCM
> 
> .
> 
> On Fri, Jun 3, 2016 at 2:03 PM Paul Michali <pc at michali.net> wrote:
> 
> > Thanks for the link Tim!
> >
> > Right now, I have two things I'm unsure about...
> >
> > One is that I had 1945 huge pages left (of size 2048k) and tried to create
> > a VM with a small flavor (2GB), which should need 1024 pages, but Nova
> > indicated that it wasn't able to find a host (and QEMU reported an
> > allocation issue).
> >
> > The other is that VMs are not being evenly distributed on my two NUMA
> > nodes, and instead, are getting created all on one NUMA node. Not sure if
> > that is expected (and setting mem_page_size to 2048 is the proper way).
> >
> > Regards,
> >
> > PCM
> >
> >
> > On Fri, Jun 3, 2016 at 1:21 PM Tim Bell <Tim.Bell at cern.ch> wrote:
> >
> >> The documentation at
> >> http://docs.openstack.org/admin-guide/compute-flavors.html is gradually
> >> improving. Are there areas which were not covered in your clarifications ?
> >> If so, we should fix the documentation too since this is a complex area to
> >> configure and good documentation is a great help.
> >>
> >>
> >>
> >> BTW, there is also an issue around how the RAM for the BIOS is shadowed.
> >> I can’t find the page from a quick google but we found an imbalance when
> >> we
> >> used 2GB pages as the RAM for BIOS shadowing was done by default in the
> >> memory space for only one of the NUMA spaces.
> >>
> >>
> >>
> >> Having a look at the KVM XML can also help a bit if you are debugging.
> >>
> >>
> >>
> >> Tim
> >>
> >>
> >>
> >> *From: *Paul Michali <pc at michali.net>
> >> *Reply-To: *"OpenStack Development Mailing List (not for usage
> >> questions)" <openstack-dev at lists.openstack.org>
> >> *Date: *Friday 3 June 2016 at 15:18
> >> *To: *"Daniel P. Berrange" <berrange at redhat.com>, "OpenStack Development
> >> Mailing List (not for usage questions)" <
> >> openstack-dev at lists.openstack.org>
> >> *Subject: *Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling
> >>
> >>
> >>
> >> See PCM inline...
> >>
> >> On Fri, Jun 3, 2016 at 8:44 AM Daniel P. Berrange <berrange at redhat.com>
> >> wrote:
> >>
> >> On Fri, Jun 03, 2016 at 12:32:17PM +0000, Paul Michali wrote:
> >> > Hi!
> >> >
> >> > I've been playing with Liberty code a bit and had some questions that
> >> I'm
> >> > hoping Nova folks may be able to provide guidance on...
> >> >
> >> > If I set up a flavor with hw:mem_page_size=2048, and I'm creating
> >> (Cirros)
> >> > VMs with size 1024, will the scheduling use the minimum of the number of
> >>
> >> 1024 what units ? 1024 MB, or 1024 huge pages aka 2048 MB ?
> >>
> >>
> >>
> >> PCM: I was using small flavor, which is 2 GB. So that's 2048 MB and the
> >> page size is 2048K, so 1024 pages? Hope I have the units right.
> >>
> >>
> >>
> >>
> >>
> >>
> >> > huge pages available and the size requested for the VM, or will it base
> >> > scheduling only on the number of huge pages?
> >> >
> >> > It seems to be doing the latter, where I had 1945 huge pages free, and
> >> > tried to create another VM (1024) and Nova rejected the request with "no
> >> > hosts available".
> >>
> >> From this I'm guessing you're meaning 1024 huge pages aka 2 GB earlier.
> >>
> >> Anyway, when you request huge pages to be used for a flavour, the
> >> entire guest RAM must be able to be allocated from huge pages.
> >> ie if you have a guest with 2 GB of RAM, you must have 2 GB worth
> >> of huge pages available. It is not possible for a VM to use
> >> 1.5 GB of huge pages and 500 MB of normal sized pages.
> >>
> >>
> >>
> >> PCM: Right, so, with 2GB of RAM, I need 1024 huge pages of size 2048K. In
> >> this case, there are 1945 huge pages available, so I was wondering why it
> >> failed. Maybe I'm confusing sizes/pages?
> >>
> >>
> >>
> >>
> >>
> >>
> >> > Is this still the same for Mitaka?
> >>
> >> Yep, this use of huge pages has not changed.
> >>
> >> > Where could I look in the code to see how the scheduling is determined?
> >>
> >> Most logic related to huge pages is in nova/virt/hardware.py
> >>
> >> > If I use mem_page_size=large (what I originally had), should it evenly
> >> > assign huge pages from the available NUMA nodes (there are two in my
> >> case)?
> >> >
> >> > It looks like it was assigning all VMs to the same NUMA node (0) in this
> >> > case. Is the right way to change to 2048, like I did above?
> >>
> >> Nova will always avoid spreading your VM across 2 host NUMA nodes,
> >> since that gives bad performance characteristics. IOW, it will always
> >> allocate huge pages from the NUMA node that the guest will run on. If
> >> you explicitly want your VM to spread across 2 host NUMA nodes, then
> >> you must tell nova to create 2 *guest* NUMA nodes for the VM. Nova
> >> will then place each guest NUMA node, on a separate host NUMA node
> >> and allocate huge pages from node to match. This is done using
> >> the hw:numa_nodes=2 parameter on the flavour
> >>
> >>
> >>
> >> PCM: Gotcha, but that was not the issue I'm seeing. With this small
> >> flavor (2GB = 1024 pages), I had 13107 huge pages initially. As I created
> >> VMs, they were *all* placed on the same NUMA node (0). As a result, when I
> >> got to more than have the available pages, Nova failed to allow further
> >> VMs, even though I had 6963 available on one compute node, and 5939 on
> >> another.
> >>
> >>
> >>
> >> It seems that all the assignments were to node zero. Someone suggested to
> >> me to set mem_page_size to 2048, and at that point it started assigning to
> >> both NUMA nodes evenly.
> >>
> >>
> >>
> >> Thanks for the help!!!
> >>
> >>
> >>
> >>
> >>
> >> Regards,
> >>
> >>
> >>
> >> PCM
> >>
> >>
> >>
> >>
> >> > Again, has this changed at all in Mitaka?
> >>
> >> Nope. Well aside from random bug fixes.
> >>
> >> Regards,
> >> Daniel
> >> --
> >> |: http://berrange.com      -o-
> >> http://www.flickr.com/photos/dberrange/ :|
> >> |: http://libvirt.org              -o-
> >> http://virt-manager.org :|
> >> |: http://autobuild.org       -o-
> >> http://search.cpan.org/~danberr/ :|
> >> |: http://entangle-photo.org       -o-
> >> http://live.gnome.org/gtk-vnc :|
> >>
> >> __________________________________________________________________________
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe:
> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >> __________________________________________________________________________
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe:
> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

-- 
Steve Gordon,
Principal Product Manager,
Red Hat OpenStack Platform



More information about the OpenStack-dev mailing list