hw:mem_page_size Works on Flavors, hw_mem_page_size Doesn't Works on Image

Sean Mooney smooney at redhat.com
Wed Feb 15 12:28:32 UTC 2023


On Wed, 2023-02-15 at 13:02 +0700, Lazuardi Nasution wrote:
> Hi Sean,
> 
> I can confirm that using only hw:mem_page_size=large on used flavor is
> working  with 2MB hugepage.
> 
> Anyway, I'm still curious about how Nova selects the hugepage size there is
> 2MB and 1GB hugepage on the hypervisor and I only put
> hw:mem_page_size=large on used flavor. Maybe you can explain more about
> this.
there are 2 aspect to this.

first nova only select from hugepage mounts that are avaiable to libvirt
by default libvirt will only use /dev/hugepages which will be mounted with the
default hugepage size you set on the kernel  commnadline or 2MB on x86_64 host if you dont
specify.

so by default unless you configure hugetlbfs_mount in /etc/libvirt/qemu.conf you will only
have access to the default page size.

hugetlbfs_mount = ["/dev/hugepages2M", "/dev/hugepages1G"]

this was the orgianal way to restrict which hugepages were used by nova.
later https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_huge_pages
was added to allow you to also reserve addtional hugepages.

assuming you have configred to mount the two sizes of hugepages and added them to hugetlbfs_mount
nova will use the info form libvirt to tack the available hugepages per numa node.

when we schdule/spawn a vm on a host we calusate an instance numa toplolgy by looking at a number of factors.
with regards to vms with an explict page size i.e. hw:mem_page_size defiend the way that works is as follwos.

first we determin if the vm should have an implict numa toplogy of 1 numa node or if you have have explcitly requested
multipel numa nodes using hw:numa_nodes or hw_numa_nodes. the process is more or less the same in either case so assuming
you have 1 virtual numa node for this explaintion nova will loop over the host numa nodes and try to fullfile the guest
memory request form a singel page size on a singel hsot numa node.
to do this we sort the list of mempage pools by pagesize
if you set hw:mem_page_size=small then we will only look at the first page size pool in each numa node and ignore the rest.
hw:mem_page_size=large is basiclaly the opicet. we check each pagesize pool excpt the smallest. 
the first pagezise pool that has enough memory free to full file the request is then used for that vms and we reqcored
botht he numa node and page size in the instance_numa_toplogy json blob in the db.

the entrypoint for this logic is the numa_fit_instance_to_host function.
https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L2295

here is where we loop over the hsot numa_nodes 
https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L2417
this is where we try to fit a instance numa ceell to a host numa cell  via _numa_fit_instance_cell
https://github.com/openstack/nova/blob/1330b280778fa834cffbc4ec9a8aa6b04cb2471e/nova/virt/hardware.py#L909
and _numa_cell_supports_pagesize_request https://github.com/openstack/nova/blob/1330b280778fa834cffbc4ec9a8aa6b04cb2471e/nova/virt/hardware.py#L593
is what actully implement the filtering logic.

so to answer your question hw:mem_page_size=large means we jsut splice off the smallest page size.

  elif inst_cell.pagesize == MEMPAGES_LARGE:
        return verify_pagesizes(host_cell, inst_cell, avail_pagesize[:-1])

so that we only check theh "large/hugepage" page sizes 
https://github.com/openstack/nova/blob/1330b280778fa834cffbc4ec9a8aa6b04cb2471e/nova/objects/numa.py#L144-L161

hopefully htat helps.

as i said best practice is to never use expcit pages sizes unless you have benchmarked the workload and determined that
using 2mb or 1G pages actully provide a perfomnce improvment.

for most workload the delta is minimal and using an explict page size is not beniffical.
there are some workloads that work better on 2MB hugepages by the way so while 1G generelaly works better
for vms 2MB generally perfromce the same. ovs-dpdk used to prefer 1G pages but i think that delta is mostly remvoed.

regards
sean
> 
> Best regards.
> 
> On Mon, Feb 13, 2023, 9:27 PM Sean Mooney <smooney at redhat.com> wrote:
> 
> > On Mon, 2023-02-13 at 21:21 +0700, Lazuardi Nasution wrote:
> > > Hi Sean,
> > > 
> > > Thank you for your explanation. I want to use the 2MB hugepage. So, do
> > you
> > > mean that I have to put hw:mem_page_size=large on used flavor and
> > > hw_mem_page_size=2MB used image? What is the difference with current
> > > hw:mem_page_size=2MB on used flavor only?
> > 
> > in general we recomend agains using explict page size unless required
> > after extensive performance testing
> > 
> > so generally i would just suggest using hw:mem_page_size=large in the
> > flaovr.
> > 
> > if you need to use exiplcit page size for a specifc VNF we advise that you
> > set hw_mem_page_size=2MB
> > on the image.
> > 
> > That way if you hae 1 flavor with hw:mem_page_size=large and 3 images one
> > with hw_mem_page_size=2MB
> > another with hw_mem_page_size=1G and a third with hw_mem_page_size not set
> > all 3 images can use the same
> > flavor.
> > 
> > you should prefer to keep the flaovrs generic by using
> > hw_mem_page_size=large|small|any and only set explict page
> > zises in the images that need that granularity.
> > 
> > you can set explcit pages sizes in the flavor too but it leasds to a
> > combinitorial flavor explostion and is gernerally
> > required. in 90% of cases hw:mem_page_size=large in the flavor is all that
> > is requried.
> > > 
> > > Best regards.
> > > 
> > > On Mon, Feb 13, 2023 at 8:45 PM Sean Mooney <smooney at redhat.com> wrote:
> > > 
> > > > On Mon, 2023-02-13 at 18:22 +0700, Lazuardi Nasution wrote:
> > > > > Hi,
> > > > > 
> > > > > There is a weird situation if it is not a bug.
> > > > > 
> > > > Its not a bug
> > > > 
> > > > hw_mem_page_size can be used in the image only if
> > > > hw:mem_page_size=larage or  hw:mem_page_size=any
> > > > 
> > > > in the case of  hw:mem_page_size=large the image can be used to choose
> > a
> > > > speccific hugepage size
> > > > 
> > > > hw:mem_page_size=large  hw_mem_page_size=1G
> > > > 
> > > > hw:mem_page_size=any will by default be the same as
> > hw:mem_page_size=small
> > > > and use the smallest pagezise
> > > > which is typically 4k. the differnece bettwen hw:mem_page_size=any and
> > > > hw:mem_page_size=small is
> > > > hw:mem_page_size=any allows the image to set any value for
> > > > hw_mem_page_size.
> > > > so
> > > > hw:mem_page_size=any and hw_mem_page_size=large is valid where as
> > > > hw:mem_page_size=small and hw_mem_page_size=large is not and will
> > raise an
> > > > error.
> > > > 
> > > > when hugepage support was added it was decided that permission form the
> > > > operator was required
> > > > to allow you to request hugepages via the image which is why the flavor
> > > > must have and hw_mem_page_size=large|any
> > > > set.
> > > > 
> > > > >  Hugepage instance launching
> > > > > is working if I put hw:mem_page_size on the used flavor. But, when I
> > try
> > > > > the same launching configuration and change hw:mem_page_size on used
> > > > flavor
> > > > > to hw_mem_page_size on used image, it cannot work as expected. It
> > seems
> > > > > that this issue is like on
> > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1791132, but it still
> > > > happens
> > > > > on Zed. Is this an old bug? Should I submit a bug report for this?
> > > > no we cloased https://bugzilla.redhat.com/show_bug.cgi?id=1791132 as
> > not
> > > > a bug because they were not aware of this requriement.
> > > > > 
> > > > > Best regards.
> > > > 
> > > > 
> > 
> > 




More information about the openstack-discuss mailing list