[nova] hw:numa_nodes question

Sean Mooney smooney at redhat.com
Thu May 11 19:40:08 UTC 2023


On Thu, 2023-05-11 at 08:40 -0500, hai wu wrote:
> Ok. Then I don't understand why 'hw:mem_page_size' is not made the
> default in case if hw:numa_node is set. There is a huge disadvantage
> if not having this one set (all existing VMs with hw:numa_node set
> will have to be taken down for resizing in order to get this one
> right).
there is an upgrade impact to changign the default.
its not impossibel to do but its complicated if we dont want to break exisitng deployments
we woudl need to recored a value for eveny current instance that was spawned before
this default was changed that had hw:numa_node without hw:mem_page_size so they kept the old behavior
and make sure that is cleared when the vm is next moved so it can have the new default
after a live migratoin.
> 
> I could not find this point mentioned in any existing Openstack
> documentation: that we would have to set hw:mem_page_size explicitly
> if hw:numa_node is set. Also this slide at
> https://www.linux-kvm.org/images/0/0b/03x03-Openstackpdf.pdf kind of
> indicates that hw:mem_page_size `Default to small pages`.
it defaults to unset.
that results in small pages by default but its not the same as hw:mem_page_size=small
or hw:mem_page_size=any.


> 
> Another question: Let's say a VM runs on one host's numa node #0. If
> we live-migrate this VM to another host, and that host's numa node #1
> has more free memory, is it possible for this VM to land on the other
> host's numa node #1?
yes it is
on newer relsese we will prefer to balance the load across numa nodes
on older release nova woudl fill the first numa node then move to the second.
> 
> On Thu, May 11, 2023 at 4:25 AM Sean Mooney <smooney at redhat.com> wrote:
> > 
> > On Wed, 2023-05-10 at 15:06 -0500, hai wu wrote:
> > > Is it possible to update something in the Openstack database for the
> > > relevant VMs in order to do the same, and then hard reboot the VM so
> > > that the VM would have this attribute?
> > not really adding the missing hw:mem_page_size requirement to the flavor chagnes the
> > requirements for node placement and numa affinity
> > so you really can only change this via resizing the vm to a new flavor
> > > 
> > > On Wed, May 10, 2023 at 2:47 PM Sean Mooney <smooney at redhat.com> wrote:
> > > > 
> > > > On Wed, 2023-05-10 at 14:22 -0500, hai wu wrote:
> > > > > So there's no default value assumed/set for hw:mem_page_size for each
> > > > > flavor?
> > > > > 
> > > > correct this is a known edgecase in the currnt design
> > > > hw:mem_page_size=any would be a resonable default but
> > > > techinially if just set hw:numa_nodes=1 nova allow memory over subscription
> > > > 
> > > > in pratch if you try to do that you will almost always end up with vms
> > > > being killed due to OOM events.
> > > > 
> > > > so from a api point of view it woudl be a change of behvior for use to default
> > > > to hw:mem_page_size=any but i think it would be the correct thign to do for operators
> > > > in the long run.
> > > > 
> > > > i could bring this up with the core team again but in the past we
> > > > decided to be conservitive and just warn peopel to alwasy set
> > > > hw:mem_page_size if using numa affinity.
> > > > 
> > > > >  Yes https://bugs.launchpad.net/nova/+bug/1893121 is critical
> > > > > when using hw:numa_nodes=1.
> > > > > 
> > > > > I did not hit an issue with 'hw:mem_page_size' not set, maybe I am
> > > > > missing some known test cases? It would be very helpful to have a test
> > > > > case where I could reproduce this issue with 'hw:numa_nodes=1' being
> > > > > set, but without 'hw:mem_page_size' being set.
> > > > > 
> > > > > How to ensure this one for existing vms already running with
> > > > > 'hw:numa_nodes=1', but without 'hw:mem_page_size' being set?
> > > > you unfortuletly need to resize the instance.
> > > > tehre are some image porpeties you can set on an instance via nova-manage
> > > > but you cannot use nova-mange to update the enbedd flavor and set this.
> > > > 
> > > > so you need to define a new flavour and resize.
> > > > 
> > > > this is the main reason we have not changed the default as it may requrie you to
> > > > move instnace around if there placement is now invalid now that per numa node memory
> > > > allocatons are correctly being accounted for.
> > > > 
> > > > if it was simple to change the default without any enduser or operator impact we would.
> > > > 
> > > > 
> > > > 
> > > > > 
> > > > > On Wed, May 10, 2023 at 1:47 PM Sean Mooney <smooney at redhat.com> wrote:
> > > > > > 
> > > > > > if you set hw:numa_nodes there are two things you should keep in mind
> > > > > > 
> > > > > > first if hw:numa_nodes si set to any value incluing hw:numa_nodes=1
> > > > > > then hw:mem_page_size shoudl also be defiend on the falvor.
> > > > > > 
> > > > > > if you dont set hw:mem_page_size then the vam will be pinned to a host numa node
> > > > > > but the avaible memory on the host numa node will not be taken into account
> > > > > > 
> > > > > > only the total free memory on the host so this almost always results in VMs being killed by the OOM reaper
> > > > > > in the kernel.
> > > > > > 
> > > > > > i recomend setting hw:mem_page_size=small hw:mem_page_size=large or hw:mem_page_size=any
> > > > > > small will use your kernels default page size for guest memory, typically this is 4k pages
> > > > > > large will use any pages size other then the smallest that is avaiable (i.e. this will use hugepages)
> > > > > > and any will use small pages but allow the guest to request hugepages via the hw_page_size image property.
> > > > > > 
> > > > > > hw:mem_page_size=any is the most flexable as a result but generally i recommend using  hw:mem_page_size=small
> > > > > > and having a seperate flavor for hugepages. its really up to you.
> > > > > > 
> > > > > > 
> > > > > > the second thing to keep in mind is using expict numa toplolig8ies including hw:numa_nodes=1
> > > > > > disables memory oversubsctipion.
> > > > > > 
> > > > > > so you will not be able ot oversubscibe the memory on the host.
> > > > > > 
> > > > > > in general its better to avoid memory oversubscribtion anyway but jsut keep that in mind.
> > > > > > you cant jsut allocate a buch of swap space and run vms at a 2:1 or higher memory over subscription ratio
> > > > > > if you are using numa affinity.
> > > > > > 
> > > > > > https://that.guru/blog/the-numa-scheduling-story-in-nova/
> > > > > > and
> > > > > > https://that.guru/blog/cpu-resources-redux/
> > > > > > 
> > > > > > are also good to read
> > > > > > 
> > > > > > i do not think stephen has a dedicated block on the memory aspect
> > > > > > but https://bugs.launchpad.net/nova/+bug/1893121 covers some of the probelem that only setting
> > > > > > hw:numa_nodes=1 will casue.
> > > > > > 
> > > > > > if you have vms with hw:numa_nodes=1 set and you do not have hw:mem_page_size set in the falvor or
> > > > > > hw_mem_page_size set in the image then that vm is not configure properly.
> > > > > > 
> > > > > > On Wed, 2023-05-10 at 11:52 -0600, Alvaro Soto wrote:
> > > > > > > Another good resource =)
> > > > > > > 
> > > > > > > https://that.guru/blog/cpu-resources/
> > > > > > > 
> > > > > > > On Wed, May 10, 2023 at 11:50 AM Alvaro Soto <alsotoes at gmail.com> wrote:
> > > > > > > 
> > > > > > > > I don't think so.
> > > > > > > > 
> > > > > > > > ~~~
> > > > > > > > The most common case will be that the admin only sets hw:numa_nodes and
> > > > > > > > then the flavor vCPUs and memory will be divided equally across the NUMA
> > > > > > > > nodes. When a NUMA policy is in effect, it is mandatory for the instance's
> > > > > > > > memory allocations to come from the NUMA nodes to which it is bound except
> > > > > > > > where overriden by hw:numa_mem.NN.
> > > > > > > > ~~~
> > > > > > > > 
> > > > > > > > Here are the implementation documents since Juno release:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > https://opendev.org/openstack/nova-specs/src/branch/master/specs/juno/implemented/virt-driver-numa-placement.rst
> > > > > > > > 
> > > > > > > > https://opendev.org/openstack/nova-specs/commit/45252df4c54674d2ac71cd88154af476c4d510e1
> > > > > > > > ?
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On Wed, May 10, 2023 at 11:31 AM hai wu <haiwu.us at gmail.com> wrote:
> > > > > > > > 
> > > > > > > > > Is there any concern to enable 'hw:numa_nodes=1' on all flavors, as
> > > > > > > > > long as that flavor can fit into one numa node?
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > --
> > > > > > > > 
> > > > > > > > Alvaro Soto
> > > > > > > > 
> > > > > > > > *Note: My work hours may not be your work hours. Please do not feel the
> > > > > > > > need to respond during a time that is not convenient for you.*
> > > > > > > > ----------------------------------------------------------
> > > > > > > > Great people talk about ideas,
> > > > > > > > ordinary people talk about things,
> > > > > > > > small people talk... about other people.
> > > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 




More information about the openstack-discuss mailing list