Hi Sean,
I can confirm that using only hw:mem_page_size=large on used flavor is working with 2MB hugepage.
Anyway, I'm still curious about how Nova selects the hugepage size there is 2MB and 1GB hugepage on the hypervisor and I only put hw:mem_page_size=large on used flavor. Maybe you can explain more about this.
On Wed, 2023-02-15 at 13:02 +0700, Lazuardi Nasution wrote: there are 2 aspect to this. first nova only select from hugepage mounts that are avaiable to libvirt by default libvirt will only use /dev/hugepages which will be mounted with the default hugepage size you set on the kernel commnadline or 2MB on x86_64 host if you dont specify. so by default unless you configure hugetlbfs_mount in /etc/libvirt/qemu.conf you will only have access to the default page size. hugetlbfs_mount = ["/dev/hugepages2M", "/dev/hugepages1G"] this was the orgianal way to restrict which hugepages were used by nova. later https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.res... was added to allow you to also reserve addtional hugepages. assuming you have configred to mount the two sizes of hugepages and added them to hugetlbfs_mount nova will use the info form libvirt to tack the available hugepages per numa node. when we schdule/spawn a vm on a host we calusate an instance numa toplolgy by looking at a number of factors. with regards to vms with an explict page size i.e. hw:mem_page_size defiend the way that works is as follwos. first we determin if the vm should have an implict numa toplogy of 1 numa node or if you have have explcitly requested multipel numa nodes using hw:numa_nodes or hw_numa_nodes. the process is more or less the same in either case so assuming you have 1 virtual numa node for this explaintion nova will loop over the host numa nodes and try to fullfile the guest memory request form a singel page size on a singel hsot numa node. to do this we sort the list of mempage pools by pagesize if you set hw:mem_page_size=small then we will only look at the first page size pool in each numa node and ignore the rest. hw:mem_page_size=large is basiclaly the opicet. we check each pagesize pool excpt the smallest. the first pagezise pool that has enough memory free to full file the request is then used for that vms and we reqcored botht he numa node and page size in the instance_numa_toplogy json blob in the db. the entrypoint for this logic is the numa_fit_instance_to_host function. https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L2295 here is where we loop over the hsot numa_nodes https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L2417 this is where we try to fit a instance numa ceell to a host numa cell via _numa_fit_instance_cell https://github.com/openstack/nova/blob/1330b280778fa834cffbc4ec9a8aa6b04cb24... and _numa_cell_supports_pagesize_request https://github.com/openstack/nova/blob/1330b280778fa834cffbc4ec9a8aa6b04cb24... is what actully implement the filtering logic. so to answer your question hw:mem_page_size=large means we jsut splice off the smallest page size. elif inst_cell.pagesize == MEMPAGES_LARGE: return verify_pagesizes(host_cell, inst_cell, avail_pagesize[:-1]) so that we only check theh "large/hugepage" page sizes https://github.com/openstack/nova/blob/1330b280778fa834cffbc4ec9a8aa6b04cb24... hopefully htat helps. as i said best practice is to never use expcit pages sizes unless you have benchmarked the workload and determined that using 2mb or 1G pages actully provide a perfomnce improvment. for most workload the delta is minimal and using an explict page size is not beniffical. there are some workloads that work better on 2MB hugepages by the way so while 1G generelaly works better for vms 2MB generally perfromce the same. ovs-dpdk used to prefer 1G pages but i think that delta is mostly remvoed. regards sean
Best regards.
On Mon, Feb 13, 2023, 9:27 PM Sean Mooney <smooney@redhat.com> wrote:
On Mon, 2023-02-13 at 21:21 +0700, Lazuardi Nasution wrote:
Hi Sean,
Thank you for your explanation. I want to use the 2MB hugepage. So, do you mean that I have to put hw:mem_page_size=large on used flavor and hw_mem_page_size=2MB used image? What is the difference with current hw:mem_page_size=2MB on used flavor only?
in general we recomend agains using explict page size unless required after extensive performance testing
so generally i would just suggest using hw:mem_page_size=large in the flaovr.
if you need to use exiplcit page size for a specifc VNF we advise that you set hw_mem_page_size=2MB on the image.
That way if you hae 1 flavor with hw:mem_page_size=large and 3 images one with hw_mem_page_size=2MB another with hw_mem_page_size=1G and a third with hw_mem_page_size not set all 3 images can use the same flavor.
you should prefer to keep the flaovrs generic by using hw_mem_page_size=large|small|any and only set explict page zises in the images that need that granularity.
you can set explcit pages sizes in the flavor too but it leasds to a combinitorial flavor explostion and is gernerally required. in 90% of cases hw:mem_page_size=large in the flavor is all that is requried.
Best regards.
On Mon, Feb 13, 2023 at 8:45 PM Sean Mooney <smooney@redhat.com> wrote:
On Mon, 2023-02-13 at 18:22 +0700, Lazuardi Nasution wrote:
Hi,
There is a weird situation if it is not a bug.
Its not a bug
hw_mem_page_size can be used in the image only if hw:mem_page_size=larage or hw:mem_page_size=any
in the case of hw:mem_page_size=large the image can be used to choose
speccific hugepage size
hw:mem_page_size=large hw_mem_page_size=1G
hw:mem_page_size=any will by default be the same as hw:mem_page_size=small and use the smallest pagezise which is typically 4k. the differnece bettwen hw:mem_page_size=any and hw:mem_page_size=small is hw:mem_page_size=any allows the image to set any value for hw_mem_page_size. so hw:mem_page_size=any and hw_mem_page_size=large is valid where as hw:mem_page_size=small and hw_mem_page_size=large is not and will raise an error.
when hugepage support was added it was decided that permission form the operator was required to allow you to request hugepages via the image which is why the flavor must have and hw_mem_page_size=large|any set.
Hugepage instance launching is working if I put hw:mem_page_size on the used flavor. But, when I
a try
the same launching configuration and change hw:mem_page_size on used flavor to hw_mem_page_size on used image, it cannot work as expected. It seems that this issue is like on https://bugzilla.redhat.com/show_bug.cgi?id=1791132, but it still happens on Zed. Is this an old bug? Should I submit a bug report for this? no we cloased https://bugzilla.redhat.com/show_bug.cgi?id=1791132 as not a bug because they were not aware of this requriement.
Best regards.