[nova] hw:numa_nodes question

hai wu haiwu.us at gmail.com
Thu May 11 13:40:22 UTC 2023


Ok. Then I don't understand why 'hw:mem_page_size' is not made the
default in case if hw:numa_node is set. There is a huge disadvantage
if not having this one set (all existing VMs with hw:numa_node set
will have to be taken down for resizing in order to get this one
right).

I could not find this point mentioned in any existing Openstack
documentation: that we would have to set hw:mem_page_size explicitly
if hw:numa_node is set. Also this slide at
https://www.linux-kvm.org/images/0/0b/03x03-Openstackpdf.pdf kind of
indicates that hw:mem_page_size `Default to small pages`.

Another question: Let's say a VM runs on one host's numa node #0. If
we live-migrate this VM to another host, and that host's numa node #1
has more free memory, is it possible for this VM to land on the other
host's numa node #1?

On Thu, May 11, 2023 at 4:25 AM Sean Mooney <smooney at redhat.com> wrote:
>
> On Wed, 2023-05-10 at 15:06 -0500, hai wu wrote:
> > Is it possible to update something in the Openstack database for the
> > relevant VMs in order to do the same, and then hard reboot the VM so
> > that the VM would have this attribute?
> not really adding the missing hw:mem_page_size requirement to the flavor chagnes the
> requirements for node placement and numa affinity
> so you really can only change this via resizing the vm to a new flavor
> >
> > On Wed, May 10, 2023 at 2:47 PM Sean Mooney <smooney at redhat.com> wrote:
> > >
> > > On Wed, 2023-05-10 at 14:22 -0500, hai wu wrote:
> > > > So there's no default value assumed/set for hw:mem_page_size for each
> > > > flavor?
> > > >
> > > correct this is a known edgecase in the currnt design
> > > hw:mem_page_size=any would be a resonable default but
> > > techinially if just set hw:numa_nodes=1 nova allow memory over subscription
> > >
> > > in pratch if you try to do that you will almost always end up with vms
> > > being killed due to OOM events.
> > >
> > > so from a api point of view it woudl be a change of behvior for use to default
> > > to hw:mem_page_size=any but i think it would be the correct thign to do for operators
> > > in the long run.
> > >
> > > i could bring this up with the core team again but in the past we
> > > decided to be conservitive and just warn peopel to alwasy set
> > > hw:mem_page_size if using numa affinity.
> > >
> > > >  Yes https://bugs.launchpad.net/nova/+bug/1893121 is critical
> > > > when using hw:numa_nodes=1.
> > > >
> > > > I did not hit an issue with 'hw:mem_page_size' not set, maybe I am
> > > > missing some known test cases? It would be very helpful to have a test
> > > > case where I could reproduce this issue with 'hw:numa_nodes=1' being
> > > > set, but without 'hw:mem_page_size' being set.
> > > >
> > > > How to ensure this one for existing vms already running with
> > > > 'hw:numa_nodes=1', but without 'hw:mem_page_size' being set?
> > > you unfortuletly need to resize the instance.
> > > tehre are some image porpeties you can set on an instance via nova-manage
> > > but you cannot use nova-mange to update the enbedd flavor and set this.
> > >
> > > so you need to define a new flavour and resize.
> > >
> > > this is the main reason we have not changed the default as it may requrie you to
> > > move instnace around if there placement is now invalid now that per numa node memory
> > > allocatons are correctly being accounted for.
> > >
> > > if it was simple to change the default without any enduser or operator impact we would.
> > >
> > >
> > >
> > > >
> > > > On Wed, May 10, 2023 at 1:47 PM Sean Mooney <smooney at redhat.com> wrote:
> > > > >
> > > > > if you set hw:numa_nodes there are two things you should keep in mind
> > > > >
> > > > > first if hw:numa_nodes si set to any value incluing hw:numa_nodes=1
> > > > > then hw:mem_page_size shoudl also be defiend on the falvor.
> > > > >
> > > > > if you dont set hw:mem_page_size then the vam will be pinned to a host numa node
> > > > > but the avaible memory on the host numa node will not be taken into account
> > > > >
> > > > > only the total free memory on the host so this almost always results in VMs being killed by the OOM reaper
> > > > > in the kernel.
> > > > >
> > > > > i recomend setting hw:mem_page_size=small hw:mem_page_size=large or hw:mem_page_size=any
> > > > > small will use your kernels default page size for guest memory, typically this is 4k pages
> > > > > large will use any pages size other then the smallest that is avaiable (i.e. this will use hugepages)
> > > > > and any will use small pages but allow the guest to request hugepages via the hw_page_size image property.
> > > > >
> > > > > hw:mem_page_size=any is the most flexable as a result but generally i recommend using  hw:mem_page_size=small
> > > > > and having a seperate flavor for hugepages. its really up to you.
> > > > >
> > > > >
> > > > > the second thing to keep in mind is using expict numa toplolig8ies including hw:numa_nodes=1
> > > > > disables memory oversubsctipion.
> > > > >
> > > > > so you will not be able ot oversubscibe the memory on the host.
> > > > >
> > > > > in general its better to avoid memory oversubscribtion anyway but jsut keep that in mind.
> > > > > you cant jsut allocate a buch of swap space and run vms at a 2:1 or higher memory over subscription ratio
> > > > > if you are using numa affinity.
> > > > >
> > > > > https://that.guru/blog/the-numa-scheduling-story-in-nova/
> > > > > and
> > > > > https://that.guru/blog/cpu-resources-redux/
> > > > >
> > > > > are also good to read
> > > > >
> > > > > i do not think stephen has a dedicated block on the memory aspect
> > > > > but https://bugs.launchpad.net/nova/+bug/1893121 covers some of the probelem that only setting
> > > > > hw:numa_nodes=1 will casue.
> > > > >
> > > > > if you have vms with hw:numa_nodes=1 set and you do not have hw:mem_page_size set in the falvor or
> > > > > hw_mem_page_size set in the image then that vm is not configure properly.
> > > > >
> > > > > On Wed, 2023-05-10 at 11:52 -0600, Alvaro Soto wrote:
> > > > > > Another good resource =)
> > > > > >
> > > > > > https://that.guru/blog/cpu-resources/
> > > > > >
> > > > > > On Wed, May 10, 2023 at 11:50 AM Alvaro Soto <alsotoes at gmail.com> wrote:
> > > > > >
> > > > > > > I don't think so.
> > > > > > >
> > > > > > > ~~~
> > > > > > > The most common case will be that the admin only sets hw:numa_nodes and
> > > > > > > then the flavor vCPUs and memory will be divided equally across the NUMA
> > > > > > > nodes. When a NUMA policy is in effect, it is mandatory for the instance's
> > > > > > > memory allocations to come from the NUMA nodes to which it is bound except
> > > > > > > where overriden by hw:numa_mem.NN.
> > > > > > > ~~~
> > > > > > >
> > > > > > > Here are the implementation documents since Juno release:
> > > > > > >
> > > > > > >
> > > > > > > https://opendev.org/openstack/nova-specs/src/branch/master/specs/juno/implemented/virt-driver-numa-placement.rst
> > > > > > >
> > > > > > > https://opendev.org/openstack/nova-specs/commit/45252df4c54674d2ac71cd88154af476c4d510e1
> > > > > > > ?
> > > > > > >
> > > > > > >
> > > > > > > On Wed, May 10, 2023 at 11:31 AM hai wu <haiwu.us at gmail.com> wrote:
> > > > > > >
> > > > > > > > Is there any concern to enable 'hw:numa_nodes=1' on all flavors, as
> > > > > > > > long as that flavor can fit into one numa node?
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Alvaro Soto
> > > > > > >
> > > > > > > *Note: My work hours may not be your work hours. Please do not feel the
> > > > > > > need to respond during a time that is not convenient for you.*
> > > > > > > ----------------------------------------------------------
> > > > > > > Great people talk about ideas,
> > > > > > > ordinary people talk about things,
> > > > > > > small people talk... about other people.
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



More information about the openstack-discuss mailing list