Re: [nova] Slow nvme performance for local storage instances

9 Aug 2023

      before digging into your setting have you tried using raw disk images
instead of qcow just to understand what overhead qcow is adding.
my guess is part of the issue is not preallcoating the qcow space but if you could check the performance
with raw images that would elimiate that as a factor.

the next step would be to look athe time properites and disk cache mode.
you mentioned followin the ceph recomendation which woudl use virtio-scsi
isntead of virtio-blk which shoudl help but tweak the cache mode to none
would also help.

On Wed, 2023-08-09 at 10:02 +0200, Jan Wasilewski wrote:
...
Hi,
I am reaching out to inquire about the performance of our local storage
setup. Currently, I am conducting tests using NVMe disks; however, the
results appear to be underwhelming.
In terms of my setup, I have recently incorporated two NVMe disks into my
compute node. These disks have been configured as RAID1 under md127 and
subsequently mounted at /var/lib/nova/instances [1]. During benchmarking
using the fio tool within this directory, I am achieving approximately
160,000 IOPS [2]. This figure serves as a satisfactory baseline and
reference point for upcoming VM tests.
As the next phase, I have established a flavor that employs a root disk for
my virtual machine [3]. Regrettably, the resulting performance yields
around 18,000 IOPS, which is nearly ten times poorer than the compute node
results [4]. While I expected some degradation, a tenfold decrease seems
excessive. Realistically, I anticipated no more than a twofold reduction
compared to the compute node's performance. Hence, I am led to ask: what
should be configured to enhance performance?
I have already experimented with the settings recommended on the Ceph page
for image properties [5]; however, these changes did not yield the desired
improvements. In addition, I attempted to modify the CPU architecture
within the nova.conf file, switching to Cascade Lake architecture, yet this
endeavor also proved ineffective. For your convenience, I have included a
link to my current dumpxml results [6].
Your insights and guidance would be greatly appreciated. I am confident
that there is a solution to this performance disparity that I may have
overlooked. Thank you in advance for your help.
/Jan Wasilewski
*References:*
*[1] nvme allocation and raid configuration:
https://paste.openstack.org/show/bMMgGqu5I6LWuoQWV7TV/
<https://paste.openstack.org/show/bMMgGqu5I6LWuoQWV7TV/>*
*[2] fio performance inside compute node:
https://paste.openstack.org/show/bcMi4zG7QZwuJZX8nyct/
<https://paste.openstack.org/show/bcMi4zG7QZwuJZX8nyct/>*
*[3] Flavor configuration:
https://paste.openstack.org/show/b7o9hCKilmJI3qyXsP5u/
<https://paste.openstack.org/show/b7o9hCKilmJI3qyXsP5u/>*
*[4] fio performance inside VM:
https://paste.openstack.org/show/bUjqxfU4nEtSFqTlU8oH/
<https://paste.openstack.org/show/bUjqxfU4nEtSFqTlU8oH/>*
*[5] image properties:
https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties
<https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties>*
*[6] dumpxml of vm: https://paste.openstack.org/show/bRECcaSMqa8TlrPp0xrT/
<https://paste.openstack.org/show/bRECcaSMqa8TlrPp0xrT/>*

Re: [nova] Slow nvme performance for local storage instances

smooney＠redhat.com