Hi Jan,
Just curious after reading valuable inputs from others and you, In what OS are you seeing performance degradation Ubuntu 20.04 LTS or Ubuntu 22.04 LTS? Soon I am going to build some compute nodes using NvME and am looking for the right OS/kernel combo for better performance.
On Mon, Aug 21, 2023 at 9:59 AM smooney@redhat.com wrote:
On Mon, 2023-08-21 at 15:06 +0200, Jan Wasilewski wrote:
Hi,
Let me add a few points. Lastly, I decided to conduct a couple of tests with the newer OpenStack platform - Zed (built by the kolla-ansible project). This platform serves Ubuntu 22.04 LTS on top of my compute
nodes.
The results were surprising, particularly because I was able to achieve
the
desired outcomes.
My compute node was equipped with 2 SSDs and 2 NVMe disks. As a
preliminary
step, I used SSD drives for testing. The fio test yielded a result of approximately 90k IOPS for the local SSD drive [1], employing IvyBridge-IBRS as the cpu_model parameter. When I transitioned to Cascadelake-Server, I managed to exceed 100k IOPS [2]. Interestingly,
when
I conducted an identical test with NVMe drives, the performance was only slightly above 90k IOPS [3]. This suggests that NVMe drives are
marginally
slower than SSD drives for local storage when used by VMs.
For the final test, I executed the fio test on the NVMe mounting point, achieving around 140k IOPS [4].
In summary, it appears that the choice of Ubuntu version as the base for compute nodes has a significant impact on performance (Ubuntu 20.04 LTS
vs.
Ubuntu 22.04 LTS). In my opinion, a kernel parameter seems to be responsible for constraining the performance within the VM (more
precisely,
the "drive file" serving as local storage for the VM). However, I'm uncertain about which specific parameter(s) are at play. I intend to
delve
deeper into this matter, but I'm open to any suggestions you may have.
thanks for reporting your observation. this may or may not be kernel related if you are using diffent version fo QEMU between each ubuntu release. if its the same version then this may indeed be related to kernel change but it may not be with parmater. rather it could be with change to the filesystem that may have improved performance for vm workloads. it could also be related to enhancements with some of the kernel mitigation that are used or a number of other factors. 20.04 to 22.04 is a large leap and there are alot of changes even if you are deploying the same version of openstack using package form the cloud archive on 20.04,
if you want to get the highest possible performance in the guest instead of setting a virutral cpu model you should set [libvirt] cpu_mode=host-passthrough
instead of [libvirt] cpu_mode=custom cpu_models=Cascadelake-Server
the down side to using host-passthrough is you will only be able to live migrate to servers with the exact same model of cpu. if all your cpus are the same or you can sub devied your cloud into sets of host with the same cpu sku i.e. via host aggrates and filters/traits then that's not really an issue.
if you do find a kernel parmater to acchive the same performacne on 20.04 please let us know but i suspect its a combination of things that have change between both releases rhater then a single thing.
/Jan Wasilewski *References:* *[1] fio results for IvyBridge and SSDs: https://paste.openstack.org/show/bUCoXBUbImd9JxplPBbv/ https://paste.openstack.org/show/bUCoXBUbImd9JxplPBbv/* *[2] fio results for Cascadelake-Server and SSDs: https://paste.openstack.org/show/bWxDkM5ITcMTlFWe4GiZ/ https://paste.openstack.org/show/bWxDkM5ITcMTlFWe4GiZ/* *[3] fio results for Cascadelake-Server and NVMe: https://paste.openstack.org/show/bbINpvkNZcJcY0KP0vPo/ https://paste.openstack.org/show/bbINpvkNZcJcY0KP0vPo/* *[4] fio results for mounting point of NVMe: https://paste.openstack.org/show/bTchYOYY3zNpSLPfOpQl/ https://paste.openstack.org/show/bTchYOYY3zNpSLPfOpQl/*
czw., 17 sie 2023 o 12:16 Jan Wasilewski finarffin@gmail.com
napisał(a):
Hi,
First and foremost, I want to express my heartfelt gratitude for all
the
invaluable insights you've provided. I meticulously studied and
conducted
numerous tests based on your inputs. While I've managed to implement certain enhancements, I'd like to delve into those improvements in an upcoming section. For now, let me address your queries.
Regarding the number of concurrent VMs operating on the OpenStack hypervisor:
- Presently, there is a sole VM running on this compute node,
occasionally there might be two instances. The compute node remains
largely
underutilized, primarily earmarked for my performance assessments.
It's
equipped with a 24-core Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz, alongside a MemTotal of 48988528 kB. Thus far, I haven't detected
any red
flags. Even during the execution of fio tests within my VMs, there
is no
discernible surge in load.
To @smooney: In relation to ide and virtio, I undertook a secondary
test,
meticulously duplicating the attachment methodology, and the outcomes
are
akin. Please refer to [1] and [2].
Nevertheless, as per your recommendation, I explored hw_pmu; however,
the
outcomes remained consistent. Find the results with hw_pmu disabled in
[3],
[4], and [5], and contrasting results with hw_pmu enabled in [6], [7],
and
[8].
Nonetheless, I did experience a substantial performance escalation,
albeit
solely for a manually attached disk—a comprehensive drive, not the disk associated with the VM as a singular file [9]. The solitary alteration involved configuring my cpu_model in nova.conf from IvyBridge to Cascadelake-Server-noTSX. Even though I achieved approximately 110k
iOPS
for the fully attached disk [10], the file-attached disk retained
around
19k iOPS [11], with comparable performance evident for the root disk
[12].
The latter is also a solitary file, albeit located on a distinct drive
of
the same model. For your perusal, I've appended all relevant dumpxml
data
[13]. In summation, it seems that the cpu_model significantly
influences
performance enhancement, though this effect is not replicated for a
"file
disk." The query thus stands: how can we elevate performance for a file disk?
Might you be willing to share the fio benchmark outcomes from your
local
storage configuration? I'm curious to ascertain whether our results
align,
or if there's a concealed optimization path I have yet to uncover. I sincerely appreciate all the assistance you've extended thus far. /Jan Wasilewski
*References:* *[1] virtio connected via virsh attach-volume to Openstack
instance(<80k
iOPS): https://paste.openstack.org/show/bHqZZWdAwWVYh1rHaIgC/ https://paste.openstack.org/show/bHqZZWdAwWVYh1rHaIgC/* *[2] virtio connected via virsh attach-volume to Openstack instance dumpxml: https://paste.openstack.org/show/bvEsKiwBd8lL4AUPSOxj/ https://paste.openstack.org/show/bvEsKiwBd8lL4AUPSOxj/* *[3] hw_pmu: False: fio - root disk: https://paste.openstack.org/show/bAZXQOUrkmVBsJ7yBEql/ https://paste.openstack.org/show/bAZXQOUrkmVBsJ7yBEql/* *[4] hw_pmu: False: fio - attached nvme disk: https://paste.openstack.org/show/bF1P0qsVG24duuY8F6HV/ https://paste.openstack.org/show/bF1P0qsVG24duuY8F6HV/* *[5] hw_pmu: False: dumpxml: https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/ https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/* *[6] hw_pmu: True: fio - root disk: https://paste.openstack.org/show/b7jJ7gR2e9VAAXm1e9PP/ https://paste.openstack.org/show/b7jJ7gR2e9VAAXm1e9PP/* *[7] hw_pmu: True: fio - attached nvme disk(82,5k iOPS) : https://paste.openstack.org/show/bCrdOnwxrJS6hENxTMK5/ https://paste.openstack.org/show/bCrdOnwxrJS6hENxTMK5/* *[8] hw_pmu: True: dumpxml: https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/ https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/* *[9] Instruction how to add a "file disk" to kvm instance:
https://www.cyberciti.biz/faq/how-to-add-disk-image-to-kvm-virtual-machine-w...
<
https://www.cyberciti.biz/faq/how-to-add-disk-image-to-kvm-virtual-machine-w...
*[10] cpu_model: Cascadelake-Server-noTSX fio - attached nvme
disk(almost
110k iOPS): https://paste.openstack.org/show/bdKQIgNIH0dy8PLhAIKq/ https://paste.openstack.org/show/bdKQIgNIH0dy8PLhAIKq/* *[11] cpu_model: Cascadelake-Server-noTSX fio - "file disk": https://paste.openstack.org/show/bjBmPBXi35jWdyJ1cjQt/ https://paste.openstack.org/show/bjBmPBXi35jWdyJ1cjQt/* *[12] cpu_model: Cascadelake-Server-noTSX fio - root disk: https://paste.openstack.org/show/br49T918vNU5NJXfXYGm/ https://paste.openstack.org/show/br49T918vNU5NJXfXYGm/* *[13] cpu_model: Cascadelake-Server-noTSX dumpxml: https://paste.openstack.org/show/bns2rWIHCHIWbrR9LUD0/ https://paste.openstack.org/show/bns2rWIHCHIWbrR9LUD0/*