Re: [nova] Slow nvme performance for local storage instances

9 Aug 2023

      I would suggest to:

- make sure that "none" I/O scheduler is used inside VM (e.g. 
/sys/block/sda/queue/scheduler). I assume quite recent kernel, otherwise 
"noop".

- make sure that host has CPU C-States above C1 disabled (check values 
of all /sys/devices/system/cpu/*/cpuidle/state*/disable for while 
[..]/name is different than "POLL", C1, C1E) or use some tool that 
disables that.

- Use raw images instead of qcow2: in [libvirt] section of nova.conf set 
force_raw_images=True and images_type=flat and recreate the instance

Is the difference so big also when you lower I/O depth (for example to 
1) or increase block size (for example to 64k) ?

On 09/08/2023 10:02, Jan Wasilewski wrote:
...
Hi,
I am reaching out to inquire about the performance of our local 
storage setup. Currently, I am conducting tests using NVMe disks; 
however, the results appear to be underwhelming.
In terms of my setup, I have recently incorporated two NVMe disks into 
my compute node. These disks have been configured as RAID1 under md127 
and subsequently mounted at /var/lib/nova/instances [1]. During 
benchmarking using the fio tool within this directory, I am achieving 
approximately 160,000 IOPS [2]. This figure serves as a satisfactory 
baseline and reference point for upcoming VM tests.
As the next phase, I have established a flavor that employs a root 
disk for my virtual machine [3]. Regrettably, the resulting 
performance yields around 18,000 IOPS, which is nearly ten times 
poorer than the compute node results [4]. While I expected some 
degradation, a tenfold decrease seems excessive. Realistically, I 
anticipated no more than a twofold reduction compared to the compute 
node's performance. Hence, I am led to ask: what should be configured 
to enhance performance?
I have already experimented with the settings recommended on the Ceph 
page for image properties [5]; however, these changes did not yield 
the desired improvements. In addition, I attempted to modify the CPU 
architecture within the nova.conf file, switching to Cascade Lake 
architecture, yet this endeavor also proved ineffective. For your 
convenience, I have included a link to my current dumpxml results [6].
Your insights and guidance would be greatly appreciated. I am 
confident that there is a solution to this performance disparity that 
I may have overlooked. Thank you in advance for your help.
/Jan Wasilewski
/References:/
/[1] nvme allocation and raid configuration: 
https://paste.openstack.org/show/bMMgGqu5I6LWuoQWV7TV//
/[2] fio performance inside compute node: 
https://paste.openstack.org/show/bcMi4zG7QZwuJZX8nyct//
/[3] Flavor configuration: 
https://paste.openstack.org/show/b7o9hCKilmJI3qyXsP5u//
/[4] fio performance inside VM: 
https://paste.openstack.org/show/bUjqxfU4nEtSFqTlU8oH//
/[5] image properties: 
https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties/
/[6] dumpxml of vm: 
https://paste.openstack.org/show/bRECcaSMqa8TlrPp0xrT//
-- 
Damian Pietras