[nova] hw:numa_nodes question

Maksim Malchuk maksim.malchuk at gmail.com
Mon May 15 20:32:59 UTC 2023


Good news, we waiting for it in Xena almost an year.

On Mon, May 15, 2023 at 11:27 PM Sean Mooney <smooney at redhat.com> wrote:

> On Mon, 2023-05-15 at 23:07 +0300, Maksim Malchuk wrote:
> > There is 6 month this backport without review, since Sean Mooney gives
> +2.
> > The next rebase needed to solve merge conflict have cleaned +2 from
> review.
>
> yes it was blocked on a question regarding does this confirm to stable
> backprot policy.
>
> we do not backport features and while this was considreed a bugfix on
> master it was also
> acknoladged that it is also a little featureish
>
> we discussed this last weak in the nova team meeting and agree it could
> proceed.
>
> but as i noted in my last reply this will have no effect if you jsut have
> hw:numa_nodes=1
>
> without hw:cpu_policy=dedicated or hw:mem_page_size
>
>
> with enableing cpu pinnign or explcit page size we do not track per numa
> node cpu or memory usage
> in the host numa toplogy object for a given compute node.
> as such without any usage informational there is noting to we the numa
> nodes with.
>
> so packing_host_numa_cells_allocation_strategy=false will not make vms
> that request a numa
> toplogy without numa resouce be balanced between the numa nodes.
>
> you still need to resize the instance to a flavor that actully properly
> request memory or cpu pinging.
>
> >
> > On Mon, May 15, 2023 at 10:53 PM hai wu <haiwu.us at gmail.com> wrote:
> >
> > > This patch was backported:
> > > https://review.opendev.org/c/openstack/nova/+/805649. Once this is in
> > > place, new VMs always get assigned correctly to the numa node with
> > > more free memory. But when existing VMs (created with vm flavor with
> > > hw:numa_node=1 set) already running on numa node #0 got live migrated,
> > > it would always be stuck on numa node #0 after live migration.
> > >
> > > So it seems we would also need to set hw:mem_page_size=small on the vm
> > > flavor, so that new VMs created from that flavor would be able to land
> > > on different numa node other than node#0 after its live migration?
> > >
> > > On Mon, May 15, 2023 at 2:33 PM Sean Mooney <smooney at redhat.com>
> wrote:
> > > >
> > > > On Mon, 2023-05-15 at 13:03 -0500, hai wu wrote:
> > > > > > > Another question: Let's say a VM runs on one host's numa node
> #0.
> > > If
> > > > > > > we live-migrate this VM to another host, and that host's numa
> node
> > > #1
> > > > > > > has more free memory, is it possible for this VM to land on the
> > > other
> > > > > > > host's numa node #1?
> > > > > > yes it is
> > > > > > on newer relsese we will prefer to balance the load across numa
> nodes
> > > > > > on older release nova woudl fill the first numa node then move to
> > > the second.
> > > > >
> > > > > About the above point, it seems even with the numa patch back
> ported
> > > > > and in place, the VM would be stuck in its existing numa node. Per
> my
> > > > > tests, after its live migration, the VM will end up on the other
> > > > > host's numa node #0, even if numa node#1 has more free memory.
> This is
> > > > > not the case for newly built VMs.
> > > > >
> > > > > Is this a design issue?
> > > > if you are using a release that supprot numa live migration (train +)
> > > >
> > >
> https://specs.openstack.org/openstack/nova-specs/specs/train/implemented/numa-aware-live-migration.html
> > > > then the numa affintiy is recalulated on live migration however numa
> > > node 0 is prefered.
> > > >
> > > > as of xena [compute]/packing_host_numa_cells_allocation_strategy has
> > > been added to contol how vms are balanced acros numa nodes
> > > > in zed the default was changed form packing vms per host numa node to
> > > balancing vms between host numa nodes
> > > >
> > >
> https://docs.openstack.org/releasenotes/nova/zed.html#relnotes-26-0-0-stable-zed-upgrade-notes
> > > >
> > > > even without the enhanchemt in xena and zed it was possible for the
> > > scheduler to select a numa node
> > > >
> > > > if you dont enable memory or cpu aware numa placment with
> > > > hw:mem_page_size or hw:cpu_policy=dedicated then it will always
> select
> > > numa 0
> > > >
> > > > if you do not request cpu pinnign or a specifc page size the
> sechudler
> > > cant properly select the host nuam node
> > > > and will alwasy use numa node 0. That is one of the reason i said
> that
> > > if hw:numa_nodes is set then hw:mem_page_size shoudl be set.
> > > >
> > > > from a nova point of view using  numa_nodes without mem_page_size is
> > > logically incorrect as you asked for
> > > > a vm to be affinites to n host numa nodes but did not enable numa
> aware
> > > memory scheduling.
> > > >
> > > > we unfortnally cant provent this in the nova api without breaking
> > > upgrades for everyone who has made this mistake.
> > > > we woudl need to force them to resize all affected instances which
> means
> > > guest downtime.
> > > > the other issue si multiple numa nodes are supproted by Hyper-V but
> they
> > > do not supprot mem_page_size
> > > >
> > > > we have tried to document this in the past but never agreed on how
> > > becasuse it subtel and requries alot of context.
> > > > the tl;dr is if the instace has a numa toplogy it should have
> > > mem_page_size set in the image or flavor but
> > > > we never foudn a good place to capture that.
> > > >
> > > > >
> > > > > On Thu, May 11, 2023 at 2:42 PM Sean Mooney <smooney at redhat.com>
> > > wrote:
> > > > > >
> > > > > > On Thu, 2023-05-11 at 08:40 -0500, hai wu wrote:
> > > > > > > Ok. Then I don't understand why 'hw:mem_page_size' is not made
> the
> > > > > > > default in case if hw:numa_node is set. There is a huge
> > > disadvantage
> > > > > > > if not having this one set (all existing VMs with hw:numa_node
> set
> > > > > > > will have to be taken down for resizing in order to get this
> one
> > > > > > > right).
> > > > > > there is an upgrade impact to changign the default.
> > > > > > its not impossibel to do but its complicated if we dont want to
> > > break exisitng deployments
> > > > > > we woudl need to recored a value for eveny current instance that
> was
> > > spawned before
> > > > > > this default was changed that had hw:numa_node without
> > > hw:mem_page_size so they kept the old behavior
> > > > > > and make sure that is cleared when the vm is next moved so it can
> > > have the new default
> > > > > > after a live migratoin.
> > > > > > >
> > > > > > > I could not find this point mentioned in any existing Openstack
> > > > > > > documentation: that we would have to set hw:mem_page_size
> > > explicitly
> > > > > > > if hw:numa_node is set. Also this slide at
> > > > > > > https://www.linux-kvm.org/images/0/0b/03x03-Openstackpdf.pdf
> kind
> > > of
> > > > > > > indicates that hw:mem_page_size `Default to small pages`.
> > > > > > it defaults to unset.
> > > > > > that results in small pages by default but its not the same as
> > > hw:mem_page_size=small
> > > > > > or hw:mem_page_size=any.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Another question: Let's say a VM runs on one host's numa node
> #0.
> > > If
> > > > > > > we live-migrate this VM to another host, and that host's numa
> node
> > > #1
> > > > > > > has more free memory, is it possible for this VM to land on the
> > > other
> > > > > > > host's numa node #1?
> > > > > > yes it is
> > > > > > on newer relsese we will prefer to balance the load across numa
> nodes
> > > > > > on older release nova woudl fill the first numa node then move to
> > > the second.
> > > > > > >
> > > > > > > On Thu, May 11, 2023 at 4:25 AM Sean Mooney <
> smooney at redhat.com>
> > > wrote:
> > > > > > > >
> > > > > > > > On Wed, 2023-05-10 at 15:06 -0500, hai wu wrote:
> > > > > > > > > Is it possible to update something in the Openstack
> database
> > > for the
> > > > > > > > > relevant VMs in order to do the same, and then hard reboot
> the
> > > VM so
> > > > > > > > > that the VM would have this attribute?
> > > > > > > > not really adding the missing hw:mem_page_size requirement to
> > > the flavor chagnes the
> > > > > > > > requirements for node placement and numa affinity
> > > > > > > > so you really can only change this via resizing the vm to a
> new
> > > flavor
> > > > > > > > >
> > > > > > > > > On Wed, May 10, 2023 at 2:47 PM Sean Mooney <
> > > smooney at redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, 2023-05-10 at 14:22 -0500, hai wu wrote:
> > > > > > > > > > > So there's no default value assumed/set for
> > > hw:mem_page_size for each
> > > > > > > > > > > flavor?
> > > > > > > > > > >
> > > > > > > > > > correct this is a known edgecase in the currnt design
> > > > > > > > > > hw:mem_page_size=any would be a resonable default but
> > > > > > > > > > techinially if just set hw:numa_nodes=1 nova allow memory
> > > over subscription
> > > > > > > > > >
> > > > > > > > > > in pratch if you try to do that you will almost always
> end
> > > up with vms
> > > > > > > > > > being killed due to OOM events.
> > > > > > > > > >
> > > > > > > > > > so from a api point of view it woudl be a change of
> behvior
> > > for use to default
> > > > > > > > > > to hw:mem_page_size=any but i think it would be the
> correct
> > > thign to do for operators
> > > > > > > > > > in the long run.
> > > > > > > > > >
> > > > > > > > > > i could bring this up with the core team again but in the
> > > past we
> > > > > > > > > > decided to be conservitive and just warn peopel to
> alwasy set
> > > > > > > > > > hw:mem_page_size if using numa affinity.
> > > > > > > > > >
> > > > > > > > > > >  Yes https://bugs.launchpad.net/nova/+bug/1893121 is
> > > critical
> > > > > > > > > > > when using hw:numa_nodes=1.
> > > > > > > > > > >
> > > > > > > > > > > I did not hit an issue with 'hw:mem_page_size' not set,
> > > maybe I am
> > > > > > > > > > > missing some known test cases? It would be very
> helpful to
> > > have a test
> > > > > > > > > > > case where I could reproduce this issue with
> > > 'hw:numa_nodes=1' being
> > > > > > > > > > > set, but without 'hw:mem_page_size' being set.
> > > > > > > > > > >
> > > > > > > > > > > How to ensure this one for existing vms already running
> > > with
> > > > > > > > > > > 'hw:numa_nodes=1', but without 'hw:mem_page_size' being
> > > set?
> > > > > > > > > > you unfortuletly need to resize the instance.
> > > > > > > > > > tehre are some image porpeties you can set on an instance
> > > via nova-manage
> > > > > > > > > > but you cannot use nova-mange to update the enbedd flavor
> > > and set this.
> > > > > > > > > >
> > > > > > > > > > so you need to define a new flavour and resize.
> > > > > > > > > >
> > > > > > > > > > this is the main reason we have not changed the default
> as
> > > it may requrie you to
> > > > > > > > > > move instnace around if there placement is now invalid
> now
> > > that per numa node memory
> > > > > > > > > > allocatons are correctly being accounted for.
> > > > > > > > > >
> > > > > > > > > > if it was simple to change the default without any
> enduser
> > > or operator impact we would.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Wed, May 10, 2023 at 1:47 PM Sean Mooney <
> > > smooney at redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > if you set hw:numa_nodes there are two things you
> should
> > > keep in mind
> > > > > > > > > > > >
> > > > > > > > > > > > first if hw:numa_nodes si set to any value incluing
> > > hw:numa_nodes=1
> > > > > > > > > > > > then hw:mem_page_size shoudl also be defiend on the
> > > falvor.
> > > > > > > > > > > >
> > > > > > > > > > > > if you dont set hw:mem_page_size then the vam will be
> > > pinned to a host numa node
> > > > > > > > > > > > but the avaible memory on the host numa node will
> not be
> > > taken into account
> > > > > > > > > > > >
> > > > > > > > > > > > only the total free memory on the host so this almost
> > > always results in VMs being killed by the OOM reaper
> > > > > > > > > > > > in the kernel.
> > > > > > > > > > > >
> > > > > > > > > > > > i recomend setting hw:mem_page_size=small
> > > hw:mem_page_size=large or hw:mem_page_size=any
> > > > > > > > > > > > small will use your kernels default page size for
> guest
> > > memory, typically this is 4k pages
> > > > > > > > > > > > large will use any pages size other then the smallest
> > > that is avaiable (i.e. this will use hugepages)
> > > > > > > > > > > > and any will use small pages but allow the guest to
> > > request hugepages via the hw_page_size image property.
> > > > > > > > > > > >
> > > > > > > > > > > > hw:mem_page_size=any is the most flexable as a result
> > > but generally i recommend using  hw:mem_page_size=small
> > > > > > > > > > > > and having a seperate flavor for hugepages. its
> really
> > > up to you.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > the second thing to keep in mind is using expict numa
> > > toplolig8ies including hw:numa_nodes=1
> > > > > > > > > > > > disables memory oversubsctipion.
> > > > > > > > > > > >
> > > > > > > > > > > > so you will not be able ot oversubscibe the memory on
> > > the host.
> > > > > > > > > > > >
> > > > > > > > > > > > in general its better to avoid memory
> oversubscribtion
> > > anyway but jsut keep that in mind.
> > > > > > > > > > > > you cant jsut allocate a buch of swap space and run
> vms
> > > at a 2:1 or higher memory over subscription ratio
> > > > > > > > > > > > if you are using numa affinity.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > https://that.guru/blog/the-numa-scheduling-story-in-nova/
> > > > > > > > > > > > and
> > > > > > > > > > > > https://that.guru/blog/cpu-resources-redux/
> > > > > > > > > > > >
> > > > > > > > > > > > are also good to read
> > > > > > > > > > > >
> > > > > > > > > > > > i do not think stephen has a dedicated block on the
> > > memory aspect
> > > > > > > > > > > > but https://bugs.launchpad.net/nova/+bug/1893121
> covers
> > > some of the probelem that only setting
> > > > > > > > > > > > hw:numa_nodes=1 will casue.
> > > > > > > > > > > >
> > > > > > > > > > > > if you have vms with hw:numa_nodes=1 set and you do
> not
> > > have hw:mem_page_size set in the falvor or
> > > > > > > > > > > > hw_mem_page_size set in the image then that vm is not
> > > configure properly.
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, 2023-05-10 at 11:52 -0600, Alvaro Soto wrote:
> > > > > > > > > > > > > Another good resource =)
> > > > > > > > > > > > >
> > > > > > > > > > > > > https://that.guru/blog/cpu-resources/
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, May 10, 2023 at 11:50 AM Alvaro Soto <
> > > alsotoes at gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I don't think so.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ~~~
> > > > > > > > > > > > > > The most common case will be that the admin only
> > > sets hw:numa_nodes and
> > > > > > > > > > > > > > then the flavor vCPUs and memory will be divided
> > > equally across the NUMA
> > > > > > > > > > > > > > nodes. When a NUMA policy is in effect, it is
> > > mandatory for the instance's
> > > > > > > > > > > > > > memory allocations to come from the NUMA nodes to
> > > which it is bound except
> > > > > > > > > > > > > > where overriden by hw:numa_mem.NN.
> > > > > > > > > > > > > > ~~~
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Here are the implementation documents since Juno
> > > release:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > >
> https://opendev.org/openstack/nova-specs/src/branch/master/specs/juno/implemented/virt-driver-numa-placement.rst
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > >
> https://opendev.org/openstack/nova-specs/commit/45252df4c54674d2ac71cd88154af476c4d510e1
> > > > > > > > > > > > > > ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, May 10, 2023 at 11:31 AM hai wu <
> > > haiwu.us at gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Is there any concern to enable
> 'hw:numa_nodes=1'
> > > on all flavors, as
> > > > > > > > > > > > > > > long as that flavor can fit into one numa node?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Alvaro Soto
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > *Note: My work hours may not be your work hours.
> > > Please do not feel the
> > > > > > > > > > > > > > need to respond during a time that is not
> convenient
> > > for you.*
> > > > > > > > > > > > > >
> > > ----------------------------------------------------------
> > > > > > > > > > > > > > Great people talk about ideas,
> > > > > > > > > > > > > > ordinary people talk about things,
> > > > > > > > > > > > > > small people talk... about other people.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
>
>

-- 
Regards,
Maksim Malchuk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230515/bf2c481f/attachment-0001.htm>


More information about the openstack-discuss mailing list