[openstack-dev] [nova] NUMA, huge pages, and scheduling
Paul Michali
pc at michali.net
Fri Jun 3 13:18:31 UTC 2016
See PCM inline...
On Fri, Jun 3, 2016 at 8:44 AM Daniel P. Berrange <berrange at redhat.com>
wrote:
> On Fri, Jun 03, 2016 at 12:32:17PM +0000, Paul Michali wrote:
> > Hi!
> >
> > I've been playing with Liberty code a bit and had some questions that I'm
> > hoping Nova folks may be able to provide guidance on...
> >
> > If I set up a flavor with hw:mem_page_size=2048, and I'm creating
> (Cirros)
> > VMs with size 1024, will the scheduling use the minimum of the number of
>
> 1024 what units ? 1024 MB, or 1024 huge pages aka 2048 MB ?
>
PCM: I was using small flavor, which is 2 GB. So that's 2048 MB and the
page size is 2048K, so 1024 pages? Hope I have the units right.
> > huge pages available and the size requested for the VM, or will it base
> > scheduling only on the number of huge pages?
> >
> > It seems to be doing the latter, where I had 1945 huge pages free, and
> > tried to create another VM (1024) and Nova rejected the request with "no
> > hosts available".
>
> From this I'm guessing you're meaning 1024 huge pages aka 2 GB earlier.
>
> Anyway, when you request huge pages to be used for a flavour, the
> entire guest RAM must be able to be allocated from huge pages.
> ie if you have a guest with 2 GB of RAM, you must have 2 GB worth
> of huge pages available. It is not possible for a VM to use
> 1.5 GB of huge pages and 500 MB of normal sized pages.
>
PCM: Right, so, with 2GB of RAM, I need 1024 huge pages of size 2048K. In
this case, there are 1945 huge pages available, so I was wondering why it
failed. Maybe I'm confusing sizes/pages?
>
> > Is this still the same for Mitaka?
>
> Yep, this use of huge pages has not changed.
>
> > Where could I look in the code to see how the scheduling is determined?
>
> Most logic related to huge pages is in nova/virt/hardware.py
>
> > If I use mem_page_size=large (what I originally had), should it evenly
> > assign huge pages from the available NUMA nodes (there are two in my
> case)?
> >
> > It looks like it was assigning all VMs to the same NUMA node (0) in this
> > case. Is the right way to change to 2048, like I did above?
>
> Nova will always avoid spreading your VM across 2 host NUMA nodes,
> since that gives bad performance characteristics. IOW, it will always
> allocate huge pages from the NUMA node that the guest will run on. If
> you explicitly want your VM to spread across 2 host NUMA nodes, then
> you must tell nova to create 2 *guest* NUMA nodes for the VM. Nova
> will then place each guest NUMA node, on a separate host NUMA node
> and allocate huge pages from node to match. This is done using
> the hw:numa_nodes=2 parameter on the flavour
>
PCM: Gotcha, but that was not the issue I'm seeing. With this small flavor
(2GB = 1024 pages), I had 13107 huge pages initially. As I created VMs,
they were *all* placed on the same NUMA node (0). As a result, when I got
to more than have the available pages, Nova failed to allow further VMs,
even though I had 6963 available on one compute node, and 5939 on another.
It seems that all the assignments were to node zero. Someone suggested to
me to set mem_page_size to 2048, and at that point it started assigning to
both NUMA nodes evenly.
Thanks for the help!!!
Regards,
PCM
>
> > Again, has this changed at all in Mitaka?
>
> Nope. Well aside from random bug fixes.
>
> Regards,
> Daniel
> --
> |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/
> :|
> |: http://libvirt.org -o- http://virt-manager.org
> :|
> |: http://autobuild.org -o- http://search.cpan.org/~danberr/
> :|
> |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc
> :|
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160603/36c63fd1/attachment.html>
More information about the OpenStack-dev
mailing list