[nova] Slow nvme performance for local storage instances

Jan Wasilewski finarffin at gmail.com
Mon Aug 21 13:06:24 UTC 2023


Hi,

Let me add a few points. Lastly, I decided to conduct a couple of tests
with the newer OpenStack platform - Zed (built by the kolla-ansible
project). This platform serves Ubuntu 22.04 LTS on top of my compute nodes.
The results were surprising, particularly because I was able to achieve the
desired outcomes.

My compute node was equipped with 2 SSDs and 2 NVMe disks. As a preliminary
step, I used SSD drives for testing. The fio test yielded a result of
approximately 90k IOPS for the local SSD drive [1], employing
IvyBridge-IBRS as the cpu_model parameter. When I transitioned to
Cascadelake-Server, I managed to exceed 100k IOPS [2]. Interestingly, when
I conducted an identical test with NVMe drives, the performance was only
slightly above 90k IOPS [3]. This suggests that NVMe drives are marginally
slower than SSD drives for local storage when used by VMs.

For the final test, I executed the fio test on the NVMe mounting point,
achieving around 140k IOPS [4].

In summary, it appears that the choice of Ubuntu version as the base for
compute nodes has a significant impact on performance (Ubuntu 20.04 LTS vs.
Ubuntu 22.04 LTS). In my opinion, a kernel parameter seems to be
responsible for constraining the performance within the VM (more precisely,
the "drive file" serving as local storage for the VM). However, I'm
uncertain about which specific parameter(s) are at play. I intend to delve
deeper into this matter, but I'm open to any suggestions you may have.

/Jan Wasilewski
*References:*
*[1] fio results for IvyBridge and SSDs:
https://paste.openstack.org/show/bUCoXBUbImd9JxplPBbv/
<https://paste.openstack.org/show/bUCoXBUbImd9JxplPBbv/>*
*[2] fio results for Cascadelake-Server and SSDs:
https://paste.openstack.org/show/bWxDkM5ITcMTlFWe4GiZ/
<https://paste.openstack.org/show/bWxDkM5ITcMTlFWe4GiZ/>*
*[3] fio results for Cascadelake-Server and NVMe:
https://paste.openstack.org/show/bbINpvkNZcJcY0KP0vPo/
<https://paste.openstack.org/show/bbINpvkNZcJcY0KP0vPo/>*
*[4] fio results for mounting point of NVMe:
https://paste.openstack.org/show/bTchYOYY3zNpSLPfOpQl/
<https://paste.openstack.org/show/bTchYOYY3zNpSLPfOpQl/>*

czw., 17 sie 2023 o 12:16 Jan Wasilewski <finarffin at gmail.com> napisał(a):

> Hi,
>
> First and foremost, I want to express my heartfelt gratitude for all the
> invaluable insights you've provided. I meticulously studied and conducted
> numerous tests based on your inputs. While I've managed to implement
> certain enhancements, I'd like to delve into those improvements in an
> upcoming section. For now, let me address your queries.
>
> Regarding the number of concurrent VMs operating on the OpenStack
> hypervisor:
>
>    - Presently, there is a sole VM running on this compute node,
>    occasionally there might be two instances. The compute node remains largely
>    underutilized, primarily earmarked for my performance assessments. It's
>    equipped with a 24-core Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz,
>    alongside a MemTotal of 48988528 kB. Thus far, I haven't detected any red
>    flags. Even during the execution of fio tests within my VMs, there is no
>    discernible surge in load.
>
> To @smooney: In relation to ide and virtio, I undertook a secondary test,
> meticulously duplicating the attachment methodology, and the outcomes are
> akin. Please refer to [1] and [2].
>
> Nevertheless, as per your recommendation, I explored hw_pmu; however, the
> outcomes remained consistent. Find the results with hw_pmu disabled in [3],
> [4], and [5], and contrasting results with hw_pmu enabled in [6], [7], and
> [8].
>
> Nonetheless, I did experience a substantial performance escalation, albeit
> solely for a manually attached disk—a comprehensive drive, not the disk
> associated with the VM as a singular file [9]. The solitary alteration
> involved configuring my cpu_model in nova.conf from IvyBridge to
> Cascadelake-Server-noTSX. Even though I achieved approximately 110k iOPS
> for the fully attached disk [10], the file-attached disk retained around
> 19k iOPS [11], with comparable performance evident for the root disk [12].
> The latter is also a solitary file, albeit located on a distinct drive of
> the same model. For your perusal, I've appended all relevant dumpxml data
> [13]. In summation, it seems that the cpu_model significantly influences
> performance enhancement, though this effect is not replicated for a "file
> disk." The query thus stands: how can we elevate performance for a file
> disk?
>
> Might you be willing to share the fio benchmark outcomes from your local
> storage configuration? I'm curious to ascertain whether our results align,
> or if there's a concealed optimization path I have yet to uncover. I
> sincerely appreciate all the assistance you've extended thus far.
> /Jan Wasilewski
>
> *References:*
> *[1] virtio connected via virsh attach-volume to Openstack instance(<80k
> iOPS): https://paste.openstack.org/show/bHqZZWdAwWVYh1rHaIgC/
> <https://paste.openstack.org/show/bHqZZWdAwWVYh1rHaIgC/>*
> *[2] virtio connected via virsh attach-volume to Openstack instance
> dumpxml: https://paste.openstack.org/show/bvEsKiwBd8lL4AUPSOxj/
> <https://paste.openstack.org/show/bvEsKiwBd8lL4AUPSOxj/>*
> *[3] hw_pmu: False: fio - root disk:
> https://paste.openstack.org/show/bAZXQOUrkmVBsJ7yBEql/
> <https://paste.openstack.org/show/bAZXQOUrkmVBsJ7yBEql/>*
> *[4] hw_pmu: False: fio - attached nvme disk:
> https://paste.openstack.org/show/bF1P0qsVG24duuY8F6HV/
> <https://paste.openstack.org/show/bF1P0qsVG24duuY8F6HV/>*
> *[5] hw_pmu: False: dumpxml:
> https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/
> <https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/>*
> *[6] hw_pmu: True: fio - root disk:
> https://paste.openstack.org/show/b7jJ7gR2e9VAAXm1e9PP/
> <https://paste.openstack.org/show/b7jJ7gR2e9VAAXm1e9PP/>*
> *[7] hw_pmu: True: fio - attached nvme disk(82,5k iOPS) :
> https://paste.openstack.org/show/bCrdOnwxrJS6hENxTMK5/
> <https://paste.openstack.org/show/bCrdOnwxrJS6hENxTMK5/>*
> *[8] hw_pmu: True: dumpxml:
> https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/
> <https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/>*
> *[9] Instruction how to add a "file disk" to kvm instance:
> https://www.cyberciti.biz/faq/how-to-add-disk-image-to-kvm-virtual-machine-with-virsh-command/
> <https://www.cyberciti.biz/faq/how-to-add-disk-image-to-kvm-virtual-machine-with-virsh-command/>*
> *[10] cpu_model: Cascadelake-Server-noTSX fio - attached nvme disk(almost
> 110k iOPS): https://paste.openstack.org/show/bdKQIgNIH0dy8PLhAIKq/
> <https://paste.openstack.org/show/bdKQIgNIH0dy8PLhAIKq/>*
> *[11] cpu_model: Cascadelake-Server-noTSX fio - "file disk":
> https://paste.openstack.org/show/bjBmPBXi35jWdyJ1cjQt/
> <https://paste.openstack.org/show/bjBmPBXi35jWdyJ1cjQt/>*
> *[12] cpu_model: Cascadelake-Server-noTSX fio - root disk:
> https://paste.openstack.org/show/br49T918vNU5NJXfXYGm/
> <https://paste.openstack.org/show/br49T918vNU5NJXfXYGm/>*
> *[13] cpu_model: Cascadelake-Server-noTSX dumpxml:
> https://paste.openstack.org/show/bns2rWIHCHIWbrR9LUD0/
> <https://paste.openstack.org/show/bns2rWIHCHIWbrR9LUD0/>*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230821/bf40cfe1/attachment-0001.htm>


More information about the openstack-discuss mailing list