[nova] Slow nvme performance for local storage instances

Satish Patel satish.txt at gmail.com
Mon Aug 21 21:18:00 UTC 2023


Hi Jan,

Just curious after reading valuable inputs from others and you, In what OS
are you seeing performance degradation Ubuntu 20.04 LTS or Ubuntu 22.04
LTS? Soon I am going to build some compute nodes using NvME and am looking
for the right OS/kernel combo for better performance.

On Mon, Aug 21, 2023 at 9:59 AM <smooney at redhat.com> wrote:

> On Mon, 2023-08-21 at 15:06 +0200, Jan Wasilewski wrote:
> > Hi,
> >
> > Let me add a few points. Lastly, I decided to conduct a couple of tests
> > with the newer OpenStack platform - Zed (built by the kolla-ansible
> > project). This platform serves Ubuntu 22.04 LTS on top of my compute
> nodes.
> > The results were surprising, particularly because I was able to achieve
> the
> > desired outcomes.
> >
> > My compute node was equipped with 2 SSDs and 2 NVMe disks. As a
> preliminary
> > step, I used SSD drives for testing. The fio test yielded a result of
> > approximately 90k IOPS for the local SSD drive [1], employing
> > IvyBridge-IBRS as the cpu_model parameter. When I transitioned to
> > Cascadelake-Server, I managed to exceed 100k IOPS [2]. Interestingly,
> when
> > I conducted an identical test with NVMe drives, the performance was only
> > slightly above 90k IOPS [3]. This suggests that NVMe drives are
> marginally
> > slower than SSD drives for local storage when used by VMs.
> >
> > For the final test, I executed the fio test on the NVMe mounting point,
> > achieving around 140k IOPS [4].
> >
> > In summary, it appears that the choice of Ubuntu version as the base for
> > compute nodes has a significant impact on performance (Ubuntu 20.04 LTS
> vs.
> > Ubuntu 22.04 LTS). In my opinion, a kernel parameter seems to be
> > responsible for constraining the performance within the VM (more
> precisely,
> > the "drive file" serving as local storage for the VM). However, I'm
> > uncertain about which specific parameter(s) are at play. I intend to
> delve
> > deeper into this matter, but I'm open to any suggestions you may have.
> thanks for reporting your observation.
> this may or may not be kernel related if you are using diffent version fo
> QEMU between
> each ubuntu release.
> if its the same version then this may indeed be related to kernel change
> but it may not
> be with parmater. rather it could be with change to the filesystem that
> may have improved
> performance for vm workloads. it could also be related to enhancements
> with some of the
> kernel mitigation that are used or a number of other factors. 20.04 to
> 22.04 is a large
> leap and there are alot of changes even if you are deploying the same
> version of openstack
> using package form the cloud archive on 20.04,
>
> if you want to get the highest possible performance in the guest instead
> of setting
> a virutral cpu model you should set
> [libvirt]
> cpu_mode=host-passthrough
>
> instead of
> [libvirt]
> cpu_mode=custom
> cpu_models=Cascadelake-Server
>
> the down side to using host-passthrough is you will only be able to live
> migrate to servers
> with the exact same model of cpu. if all your cpus are the same or you can
> sub devied
> your cloud into sets of host with the same cpu sku i.e. via host aggrates
> and filters/traits
> then that's not really an issue.
>
>
> if you do find a kernel parmater to acchive the same performacne on 20.04
> please let us
> know but i suspect its a combination of things that have change between
> both releases
> rhater then a single thing.
>
> >
> > /Jan Wasilewski
> > *References:*
> > *[1] fio results for IvyBridge and SSDs:
> > https://paste.openstack.org/show/bUCoXBUbImd9JxplPBbv/
> > <https://paste.openstack.org/show/bUCoXBUbImd9JxplPBbv/>*
> > *[2] fio results for Cascadelake-Server and SSDs:
> > https://paste.openstack.org/show/bWxDkM5ITcMTlFWe4GiZ/
> > <https://paste.openstack.org/show/bWxDkM5ITcMTlFWe4GiZ/>*
> > *[3] fio results for Cascadelake-Server and NVMe:
> > https://paste.openstack.org/show/bbINpvkNZcJcY0KP0vPo/
> > <https://paste.openstack.org/show/bbINpvkNZcJcY0KP0vPo/>*
> > *[4] fio results for mounting point of NVMe:
> > https://paste.openstack.org/show/bTchYOYY3zNpSLPfOpQl/
> > <https://paste.openstack.org/show/bTchYOYY3zNpSLPfOpQl/>*
> >
> > czw., 17 sie 2023 o 12:16 Jan Wasilewski <finarffin at gmail.com>
> napisał(a):
> >
> > > Hi,
> > >
> > > First and foremost, I want to express my heartfelt gratitude for all
> the
> > > invaluable insights you've provided. I meticulously studied and
> conducted
> > > numerous tests based on your inputs. While I've managed to implement
> > > certain enhancements, I'd like to delve into those improvements in an
> > > upcoming section. For now, let me address your queries.
> > >
> > > Regarding the number of concurrent VMs operating on the OpenStack
> > > hypervisor:
> > >
> > >    - Presently, there is a sole VM running on this compute node,
> > >    occasionally there might be two instances. The compute node remains
> largely
> > >    underutilized, primarily earmarked for my performance assessments.
> It's
> > >    equipped with a 24-core Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz,
> > >    alongside a MemTotal of 48988528 kB. Thus far, I haven't detected
> any red
> > >    flags. Even during the execution of fio tests within my VMs, there
> is no
> > >    discernible surge in load.
> > >
> > > To @smooney: In relation to ide and virtio, I undertook a secondary
> test,
> > > meticulously duplicating the attachment methodology, and the outcomes
> are
> > > akin. Please refer to [1] and [2].
> > >
> > > Nevertheless, as per your recommendation, I explored hw_pmu; however,
> the
> > > outcomes remained consistent. Find the results with hw_pmu disabled in
> [3],
> > > [4], and [5], and contrasting results with hw_pmu enabled in [6], [7],
> and
> > > [8].
> > >
> > > Nonetheless, I did experience a substantial performance escalation,
> albeit
> > > solely for a manually attached disk—a comprehensive drive, not the disk
> > > associated with the VM as a singular file [9]. The solitary alteration
> > > involved configuring my cpu_model in nova.conf from IvyBridge to
> > > Cascadelake-Server-noTSX. Even though I achieved approximately 110k
> iOPS
> > > for the fully attached disk [10], the file-attached disk retained
> around
> > > 19k iOPS [11], with comparable performance evident for the root disk
> [12].
> > > The latter is also a solitary file, albeit located on a distinct drive
> of
> > > the same model. For your perusal, I've appended all relevant dumpxml
> data
> > > [13]. In summation, it seems that the cpu_model significantly
> influences
> > > performance enhancement, though this effect is not replicated for a
> "file
> > > disk." The query thus stands: how can we elevate performance for a file
> > > disk?
> > >
> > > Might you be willing to share the fio benchmark outcomes from your
> local
> > > storage configuration? I'm curious to ascertain whether our results
> align,
> > > or if there's a concealed optimization path I have yet to uncover. I
> > > sincerely appreciate all the assistance you've extended thus far.
> > > /Jan Wasilewski
> > >
> > > *References:*
> > > *[1] virtio connected via virsh attach-volume to Openstack
> instance(<80k
> > > iOPS): https://paste.openstack.org/show/bHqZZWdAwWVYh1rHaIgC/
> > > <https://paste.openstack.org/show/bHqZZWdAwWVYh1rHaIgC/>*
> > > *[2] virtio connected via virsh attach-volume to Openstack instance
> > > dumpxml: https://paste.openstack.org/show/bvEsKiwBd8lL4AUPSOxj/
> > > <https://paste.openstack.org/show/bvEsKiwBd8lL4AUPSOxj/>*
> > > *[3] hw_pmu: False: fio - root disk:
> > > https://paste.openstack.org/show/bAZXQOUrkmVBsJ7yBEql/
> > > <https://paste.openstack.org/show/bAZXQOUrkmVBsJ7yBEql/>*
> > > *[4] hw_pmu: False: fio - attached nvme disk:
> > > https://paste.openstack.org/show/bF1P0qsVG24duuY8F6HV/
> > > <https://paste.openstack.org/show/bF1P0qsVG24duuY8F6HV/>*
> > > *[5] hw_pmu: False: dumpxml:
> > > https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/
> > > <https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/>*
> > > *[6] hw_pmu: True: fio - root disk:
> > > https://paste.openstack.org/show/b7jJ7gR2e9VAAXm1e9PP/
> > > <https://paste.openstack.org/show/b7jJ7gR2e9VAAXm1e9PP/>*
> > > *[7] hw_pmu: True: fio - attached nvme disk(82,5k iOPS) :
> > > https://paste.openstack.org/show/bCrdOnwxrJS6hENxTMK5/
> > > <https://paste.openstack.org/show/bCrdOnwxrJS6hENxTMK5/>*
> > > *[8] hw_pmu: True: dumpxml:
> > > https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/
> > > <https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/>*
> > > *[9] Instruction how to add a "file disk" to kvm instance:
> > >
> https://www.cyberciti.biz/faq/how-to-add-disk-image-to-kvm-virtual-machine-with-virsh-command/
> > > <
> https://www.cyberciti.biz/faq/how-to-add-disk-image-to-kvm-virtual-machine-with-virsh-command/
> >*
> > > *[10] cpu_model: Cascadelake-Server-noTSX fio - attached nvme
> disk(almost
> > > 110k iOPS): https://paste.openstack.org/show/bdKQIgNIH0dy8PLhAIKq/
> > > <https://paste.openstack.org/show/bdKQIgNIH0dy8PLhAIKq/>*
> > > *[11] cpu_model: Cascadelake-Server-noTSX fio - "file disk":
> > > https://paste.openstack.org/show/bjBmPBXi35jWdyJ1cjQt/
> > > <https://paste.openstack.org/show/bjBmPBXi35jWdyJ1cjQt/>*
> > > *[12] cpu_model: Cascadelake-Server-noTSX fio - root disk:
> > > https://paste.openstack.org/show/br49T918vNU5NJXfXYGm/
> > > <https://paste.openstack.org/show/br49T918vNU5NJXfXYGm/>*
> > > *[13] cpu_model: Cascadelake-Server-noTSX dumpxml:
> > > https://paste.openstack.org/show/bns2rWIHCHIWbrR9LUD0/
> > > <https://paste.openstack.org/show/bns2rWIHCHIWbrR9LUD0/>*
> > >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230821/d62d1a6d/attachment.htm>


More information about the openstack-discuss mailing list