before digging into your setting have you tried using raw disk images instead of qcow just to understand what overhead qcow is adding. my guess is part of the issue is not preallcoating the qcow space but if you could check the performance with raw images that would elimiate that as a factor. the next step would be to look athe time properites and disk cache mode. you mentioned followin the ceph recomendation which woudl use virtio-scsi isntead of virtio-blk which shoudl help but tweak the cache mode to none would also help. On Wed, 2023-08-09 at 10:02 +0200, Jan Wasilewski wrote:
Hi,
I am reaching out to inquire about the performance of our local storage setup. Currently, I am conducting tests using NVMe disks; however, the results appear to be underwhelming.
In terms of my setup, I have recently incorporated two NVMe disks into my compute node. These disks have been configured as RAID1 under md127 and subsequently mounted at /var/lib/nova/instances [1]. During benchmarking using the fio tool within this directory, I am achieving approximately 160,000 IOPS [2]. This figure serves as a satisfactory baseline and reference point for upcoming VM tests.
As the next phase, I have established a flavor that employs a root disk for my virtual machine [3]. Regrettably, the resulting performance yields around 18,000 IOPS, which is nearly ten times poorer than the compute node results [4]. While I expected some degradation, a tenfold decrease seems excessive. Realistically, I anticipated no more than a twofold reduction compared to the compute node's performance. Hence, I am led to ask: what should be configured to enhance performance?
I have already experimented with the settings recommended on the Ceph page for image properties [5]; however, these changes did not yield the desired improvements. In addition, I attempted to modify the CPU architecture within the nova.conf file, switching to Cascade Lake architecture, yet this endeavor also proved ineffective. For your convenience, I have included a link to my current dumpxml results [6].
Your insights and guidance would be greatly appreciated. I am confident that there is a solution to this performance disparity that I may have overlooked. Thank you in advance for your help. /Jan Wasilewski
*References:* *[1] nvme allocation and raid configuration: https://paste.openstack.org/show/bMMgGqu5I6LWuoQWV7TV/ <https://paste.openstack.org/show/bMMgGqu5I6LWuoQWV7TV/>* *[2] fio performance inside compute node: https://paste.openstack.org/show/bcMi4zG7QZwuJZX8nyct/ <https://paste.openstack.org/show/bcMi4zG7QZwuJZX8nyct/>* *[3] Flavor configuration: https://paste.openstack.org/show/b7o9hCKilmJI3qyXsP5u/ <https://paste.openstack.org/show/b7o9hCKilmJI3qyXsP5u/>* *[4] fio performance inside VM: https://paste.openstack.org/show/bUjqxfU4nEtSFqTlU8oH/ <https://paste.openstack.org/show/bUjqxfU4nEtSFqTlU8oH/>* *[5] image properties: https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties <https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties>* *[6] dumpxml of vm: https://paste.openstack.org/show/bRECcaSMqa8TlrPp0xrT/ <https://paste.openstack.org/show/bRECcaSMqa8TlrPp0xrT/>*