[nova] Slow nvme performance for local storage instances

older
[sunbeam] Juju 3.2 stable release...

Jan Wasilewski

9 Aug 2023 9 Aug '23

1:02 a.m.

Hi, I am reaching out to inquire about the performance of our local storage setup. Currently, I am conducting tests using NVMe disks; however, the results appear to be underwhelming. In terms of my setup, I have recently incorporated two NVMe disks into my compute node. These disks have been configured as RAID1 under md127 and subsequently mounted at /var/lib/nova/instances [1]. During benchmarking using the fio tool within this directory, I am achieving approximately 160,000 IOPS [2]. This figure serves as a satisfactory baseline and reference point for upcoming VM tests. As the next phase, I have established a flavor that employs a root disk for my virtual machine [3]. Regrettably, the resulting performance yields around 18,000 IOPS, which is nearly ten times poorer than the compute node results [4]. While I expected some degradation, a tenfold decrease seems excessive. Realistically, I anticipated no more than a twofold reduction compared to the compute node's performance. Hence, I am led to ask: what should be configured to enhance performance? I have already experimented with the settings recommended on the Ceph page for image properties [5]; however, these changes did not yield the desired improvements. In addition, I attempted to modify the CPU architecture within the nova.conf file, switching to Cascade Lake architecture, yet this endeavor also proved ineffective. For your convenience, I have included a link to my current dumpxml results [6]. Your insights and guidance would be greatly appreciated. I am confident that there is a solution to this performance disparity that I may have overlooked. Thank you in advance for your help. /Jan Wasilewski *References:* *[1] nvme allocation and raid configuration: https://paste.openstack.org/show/bMMgGqu5I6LWuoQWV7TV/ <https://paste.openstack.org/show/bMMgGqu5I6LWuoQWV7TV/>* *[2] fio performance inside compute node: https://paste.openstack.org/show/bcMi4zG7QZwuJZX8nyct/ <https://paste.openstack.org/show/bcMi4zG7QZwuJZX8nyct/>* *[3] Flavor configuration: https://paste.openstack.org/show/b7o9hCKilmJI3qyXsP5u/ <https://paste.openstack.org/show/b7o9hCKilmJI3qyXsP5u/>* *[4] fio performance inside VM: https://paste.openstack.org/show/bUjqxfU4nEtSFqTlU8oH/ <https://paste.openstack.org/show/bUjqxfU4nEtSFqTlU8oH/>* *[5] image properties: https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties <https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties>* *[6] dumpxml of vm: https://paste.openstack.org/show/bRECcaSMqa8TlrPp0xrT/ <https://paste.openstack.org/show/bRECcaSMqa8TlrPp0xrT/>*

Attachments:

attachment.html (text/html — 2.8 KB)

Show replies by date

smooney＠redhat.com

9 Aug 9 Aug

4:55 a.m.

before digging into your setting have you tried using raw disk images instead of qcow just to understand what overhead qcow is adding. my guess is part of the issue is not preallcoating the qcow space but if you could check the performance with raw images that would elimiate that as a factor. the next step would be to look athe time properites and disk cache mode. you mentioned followin the ceph recomendation which woudl use virtio-scsi isntead of virtio-blk which shoudl help but tweak the cache mode to none would also help. On Wed, 2023-08-09 at 10:02 +0200, Jan Wasilewski wrote:

...

Hi,

I am reaching out to inquire about the performance of our local storage setup. Currently, I am conducting tests using NVMe disks; however, the results appear to be underwhelming.

In terms of my setup, I have recently incorporated two NVMe disks into my compute node. These disks have been configured as RAID1 under md127 and subsequently mounted at /var/lib/nova/instances [1]. During benchmarking using the fio tool within this directory, I am achieving approximately 160,000 IOPS [2]. This figure serves as a satisfactory baseline and reference point for upcoming VM tests.

As the next phase, I have established a flavor that employs a root disk for my virtual machine [3]. Regrettably, the resulting performance yields around 18,000 IOPS, which is nearly ten times poorer than the compute node results [4]. While I expected some degradation, a tenfold decrease seems excessive. Realistically, I anticipated no more than a twofold reduction compared to the compute node's performance. Hence, I am led to ask: what should be configured to enhance performance?

I have already experimented with the settings recommended on the Ceph page for image properties [5]; however, these changes did not yield the desired improvements. In addition, I attempted to modify the CPU architecture within the nova.conf file, switching to Cascade Lake architecture, yet this endeavor also proved ineffective. For your convenience, I have included a link to my current dumpxml results [6].

Your insights and guidance would be greatly appreciated. I am confident that there is a solution to this performance disparity that I may have overlooked. Thank you in advance for your help. /Jan Wasilewski

*References:* *[1] nvme allocation and raid configuration: https://paste.openstack.org/show/bMMgGqu5I6LWuoQWV7TV/ <https://paste.openstack.org/show/bMMgGqu5I6LWuoQWV7TV/>* *[2] fio performance inside compute node: https://paste.openstack.org/show/bcMi4zG7QZwuJZX8nyct/ <https://paste.openstack.org/show/bcMi4zG7QZwuJZX8nyct/>* *[3] Flavor configuration: https://paste.openstack.org/show/b7o9hCKilmJI3qyXsP5u/ <https://paste.openstack.org/show/b7o9hCKilmJI3qyXsP5u/>* *[4] fio performance inside VM: https://paste.openstack.org/show/bUjqxfU4nEtSFqTlU8oH/ <https://paste.openstack.org/show/bUjqxfU4nEtSFqTlU8oH/>* *[5] image properties: https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties <https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties>* *[6] dumpxml of vm: https://paste.openstack.org/show/bRECcaSMqa8TlrPp0xrT/ <https://paste.openstack.org/show/bRECcaSMqa8TlrPp0xrT/>*

Sven Kieske

4:59 a.m.

Hi, I can't cover everything here, because performance is a huge topic, but here are some questions which I didn't find the answer to: which nvme is this? is this a consumer device by chance? which openstack release are you running, which hypervisor os and which guest os and kernel versions? which deployment method do you use?

...

*[5] image properties: https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties <https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties

at least the ceph recommendations for virtio-scsi are somewhat outdated as virtio-blk is, depending on the benchmark you look at - faster and also supports discard (it's a well hidden secret, but it's true). I'd test with virtio-blk first and make sure deadline scheduler is used inside the vm and caching should be none. HTH -- Sven Kieske Senior Cloud Engineer Mail: kieske@osism.tech Web: https://osism.tech OSISM GmbH Teckstraße 62 / 70190 Stuttgart / Deutschland Geschäftsführer: Christian Berendt Unternehmenssitz: Stuttgart Amtsgericht: Stuttgart, HRB 756139

smooney＠redhat.com

5:26 a.m.

On Wed, 2023-08-09 at 13:59 +0200, Sven Kieske wrote:

...

Hi,

I can't cover everything here, because performance is a huge topic, but here are some questions which I didn't find the answer to:

which nvme is this? is this a consumer device by chance?

which openstack release are you running, which hypervisor os and which guest os and kernel versions?

which deployment method do you use?

...
*[5] image properties: https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties <https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties

at least the ceph recommendations for virtio-scsi are somewhat outdated as virtio-blk is, depending on the benchmark you look at - faster and also supports discard (it's a well hidden secret, but it's true). discard support depend on your distro and machine type

on rhel-9/centos 9 stream this is only true if using q35 machine type the only pc-i440fx machine type in centos 9 is form centso 7 and that predates that feature landing in qemu so if you use the pc/pc-i440fx manchie type virtio-blk does not support discard. on debian based distos since they ship the upstream qemu machine types this is not an issue. it will work with pc too. in general the reason to use virtio-scsi for ceph is that it allow more volues to be used as each one does not consume a pci devce unlike virtio-blk. so if you need many disks use virtio-scsi if you need less then 30 ish then use virtio-blk. nova does not curretly supprrt multiqueue or iothread which are likely required to fully use nvme ssds. we did start workign on a proposal for that but the person working on it move to a diffent role so that is likely something we will look at again for the next deveolpment cycle.

...

I'd test with virtio-blk first and make sure deadline scheduler is used inside the vm and caching should be none.

HTH

Damian Pietras

10:56 a.m.

I would suggest to: - make sure that "none" I/O scheduler is used inside VM (e.g. /sys/block/sda/queue/scheduler). I assume quite recent kernel, otherwise "noop". - make sure that host has CPU C-States above C1 disabled (check values of all /sys/devices/system/cpu/*/cpuidle/state*/disable for while [..]/name is different than "POLL", C1, C1E) or use some tool that disables that. - Use raw images instead of qcow2: in [libvirt] section of nova.conf set force_raw_images=True and images_type=flat and recreate the instance Is the difference so big also when you lower I/O depth (for example to 1) or increase block size (for example to 64k) ? On 09/08/2023 10:02, Jan Wasilewski wrote:

...

Hi,

I am reaching out to inquire about the performance of our local storage setup. Currently, I am conducting tests using NVMe disks; however, the results appear to be underwhelming.

In terms of my setup, I have recently incorporated two NVMe disks into my compute node. These disks have been configured as RAID1 under md127 and subsequently mounted at /var/lib/nova/instances [1]. During benchmarking using the fio tool within this directory, I am achieving approximately 160,000 IOPS [2]. This figure serves as a satisfactory baseline and reference point for upcoming VM tests.

As the next phase, I have established a flavor that employs a root disk for my virtual machine [3]. Regrettably, the resulting performance yields around 18,000 IOPS, which is nearly ten times poorer than the compute node results [4]. While I expected some degradation, a tenfold decrease seems excessive. Realistically, I anticipated no more than a twofold reduction compared to the compute node's performance. Hence, I am led to ask: what should be configured to enhance performance?

I have already experimented with the settings recommended on the Ceph page for image properties [5]; however, these changes did not yield the desired improvements. In addition, I attempted to modify the CPU architecture within the nova.conf file, switching to Cascade Lake architecture, yet this endeavor also proved ineffective. For your convenience, I have included a link to my current dumpxml results [6].

Your insights and guidance would be greatly appreciated. I am confident that there is a solution to this performance disparity that I may have overlooked. Thank you in advance for your help.

/Jan Wasilewski

/References:/ /[1] nvme allocation and raid configuration: https://paste.openstack.org/show/bMMgGqu5I6LWuoQWV7TV// /[2] fio performance inside compute node: https://paste.openstack.org/show/bcMi4zG7QZwuJZX8nyct// /[3] Flavor configuration: https://paste.openstack.org/show/b7o9hCKilmJI3qyXsP5u// /[4] fio performance inside VM: https://paste.openstack.org/show/bUjqxfU4nEtSFqTlU8oH// /[5] image properties: https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties/ /[6] dumpxml of vm: https://paste.openstack.org/show/bRECcaSMqa8TlrPp0xrT//

-- Damian Pietras

Jan Wasilewski

10 Aug 10 Aug

4:35 a.m.

Hi, I wanted to express my sincere gratitude for all the help and advice you've given me. I followed your suggestions and carried out a bunch of tests, but unfortunately, the performance boost I was hoping for hasn't materialized. Let me break down the configurations I've tried and the results I've got. Just to give you some context, all my tests were done using two INTEL SSDPE2MD400G4 NVMe disks and Ubuntu 20.04LTS as the OS on the compute node. You can find all the nitty-gritty details in [1] and [2]. Additionally, I've shared the results of the fio tests directly executed on the RAID directory within the compute node in [3]. Then, I expanded my testing to instances, and here's what I found: 1. I tested things out with the default settings and Ubuntu 22.04 LTS image. The iOPS results were hovering around 18-18.5k. Check out [4] and [5] for the specifics. 2. I tweaked the nova.conf file with two changes: force_raw_images = true and images_type = flat. Unfortunately, this only brought the iOPS down a bit, to just under 18k. You can see more in [6] and [7]. 3. I made an extra change in nova.conf by switching the cpu_model from SandyBridge to IvyBridge. This change dropped the iOPS further, to around 17k. Details are in [8] and [9]. 4. Lastly, I played around with image properties, setting hw_scsi_model=virtio-scsi and hw_disk_bus=scsi. However, this also resulted in around 17k iOPS. You can find out more in [10] and [11]. It's a bit disheartening that none of these changes seemed to have the impact I was aiming for. So, I'm starting to think there might be a crucial piece of the puzzle that I'm missing here. If you have any ideas or insights, I'd be incredibly grateful for your input. Thanks once more for all your help and support. /Jan Wasilewski *References: * *[1] Disk details and raid details: https://paste.openstack.org/show/bRyLPZ6TDHpIEKadLC7z/ <https://paste.openstack.org/show/bRyLPZ6TDHpIEKadLC7z/>* *[2] Compute node and nova details: https://paste.openstack.org/show/bcGw3Glm6U0r1kUsg8nU/ <https://paste.openstack.org/show/bcGw3Glm6U0r1kUsg8nU/>* *[3] fio results executed in raid directory inside compute node: https://paste.openstack.org/show/bN0EkBjoAP2Ig5PSSfy3/ <https://paste.openstack.org/show/bN0EkBjoAP2Ig5PSSfy3/>* *[4] dumpxml of instance from test 1: https://paste.openstack.org/show/bVSq8tz1bSMdiYXcF3IP/ <https://paste.openstack.org/show/bVSq8tz1bSMdiYXcF3IP/>* *[5] fio results from test 1: https://paste.openstack.org/show/bKlxom8Yl7NtHO8kO53a/ <https://paste.openstack.org/show/bKlxom8Yl7NtHO8kO53a/>* *[6] dumpxml of instance from test 2: https://paste.openstack.org/show/bN2JN9DXT4DGKNZnzkJ8/ <https://paste.openstack.org/show/bN2JN9DXT4DGKNZnzkJ8/>* *[7] fio results from test 2: https://paste.openstack.org/show/b7GXIVI2Cv0qkVLQaAF3/ <https://paste.openstack.org/show/b7GXIVI2Cv0qkVLQaAF3/>* *[8] dumpxml of instance from test 3: https://paste.openstack.org/show/b0821V4IUq8N7YPb73sg/ <https://paste.openstack.org/show/b0821V4IUq8N7YPb73sg/>* *[9] fio results from test 3: https://paste.openstack.org/show/bT1Erfxq4XTj0ubTTgdj/ <https://paste.openstack.org/show/bT1Erfxq4XTj0ubTTgdj/>* *[10] dumpxml of instance from test 4: https://paste.openstack.org/show/bjTXM0do1xgzmVZO02Q7/ <https://paste.openstack.org/show/bjTXM0do1xgzmVZO02Q7/>* *[11] fio results from test 4: https://paste.openstack.org/show/bpbVJntkR5aNke3trtRd/ <https://paste.openstack.org/show/bpbVJntkR5aNke3trtRd/>* śr., 9 sie 2023 o 19:56 Damian Pietras <damian.pietras@hardit.pl> napisał(a):

...

I would suggest to:

- make sure that "none" I/O scheduler is used inside VM (e.g. /sys/block/sda/queue/scheduler). I assume quite recent kernel, otherwise "noop".

- make sure that host has CPU C-States above C1 disabled (check values of all /sys/devices/system/cpu/*/cpuidle/state*/disable for while [..]/name is different than "POLL", C1, C1E) or use some tool that disables that.

- Use raw images instead of qcow2: in [libvirt] section of nova.conf set force_raw_images=True and images_type=flat and recreate the instance

Is the difference so big also when you lower I/O depth (for example to 1) or increase block size (for example to 64k) ?

On 09/08/2023 10:02, Jan Wasilewski wrote:

Hi,

I am reaching out to inquire about the performance of our local storage setup. Currently, I am conducting tests using NVMe disks; however, the results appear to be underwhelming.

In terms of my setup, I have recently incorporated two NVMe disks into my compute node. These disks have been configured as RAID1 under md127 and subsequently mounted at /var/lib/nova/instances [1]. During benchmarking using the fio tool within this directory, I am achieving approximately 160,000 IOPS [2]. This figure serves as a satisfactory baseline and reference point for upcoming VM tests.

As the next phase, I have established a flavor that employs a root disk for my virtual machine [3]. Regrettably, the resulting performance yields around 18,000 IOPS, which is nearly ten times poorer than the compute node results [4]. While I expected some degradation, a tenfold decrease seems excessive. Realistically, I anticipated no more than a twofold reduction compared to the compute node's performance. Hence, I am led to ask: what should be configured to enhance performance?

I have already experimented with the settings recommended on the Ceph page for image properties [5]; however, these changes did not yield the desired improvements. In addition, I attempted to modify the CPU architecture within the nova.conf file, switching to Cascade Lake architecture, yet this endeavor also proved ineffective. For your convenience, I have included a link to my current dumpxml results [6].

Your insights and guidance would be greatly appreciated. I am confident that there is a solution to this performance disparity that I may have overlooked. Thank you in advance for your help. /Jan Wasilewski

*References:* *[1] nvme allocation and raid configuration: https://paste.openstack.org/show/bMMgGqu5I6LWuoQWV7TV/ <https://paste.openstack.org/show/bMMgGqu5I6LWuoQWV7TV/>* *[2] fio performance inside compute node: https://paste.openstack.org/show/bcMi4zG7QZwuJZX8nyct/ <https://paste.openstack.org/show/bcMi4zG7QZwuJZX8nyct/>* *[3] Flavor configuration: https://paste.openstack.org/show/b7o9hCKilmJI3qyXsP5u/ <https://paste.openstack.org/show/b7o9hCKilmJI3qyXsP5u/>* *[4] fio performance inside VM: https://paste.openstack.org/show/bUjqxfU4nEtSFqTlU8oH/ <https://paste.openstack.org/show/bUjqxfU4nEtSFqTlU8oH/>* *[5] image properties: https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties <https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties>* *[6] dumpxml of vm: https://paste.openstack.org/show/bRECcaSMqa8TlrPp0xrT/ <https://paste.openstack.org/show/bRECcaSMqa8TlrPp0xrT/>*

-- Damian Pietras

Damian Pietras

5:05 a.m.

HI, You wrote "/sys/devices/system/cpu/*/cpuidle/state*/disable output is 0 for all cpus". It means all C-states (power saving states) are _enabled_. This may cause lower and inconsistent results. I would repeat the test with deeper C-states disabled. I think simplest way to do that is to boot system (nova compute node) with "|intel_idle.max_cstate=1" added to kernel command line parameters. I had similar issues with I/O performance inside VMs (but with lower disk queue depth) and power saving / frequency scaling had greatest influence on the results and also caused variations in the results between test runs. If you are out of ideas you could also rule out disk / filesystem / RAID configuration influence by temporary mounting tmpfs in /var/lib/nova/instances so the instances will have RAM-backed volumes. You need enough RAM for that of course. | On 10/08/2023 13:35, Jan Wasilewski wrote:

...

Hi,

I wanted to express my sincere gratitude for all the help and advice you've given me. I followed your suggestions and carried out a bunch of tests, but unfortunately, the performance boost I was hoping for hasn't materialized.

Let me break down the configurations I've tried and the results I've got. Just to give you some context, all my tests were done using two INTEL SSDPE2MD400G4 NVMe disks and Ubuntu 20.04LTS as the OS on the compute node. You can find all the nitty-gritty details in [1] and [2]. Additionally, I've shared the results of the fio tests directly executed on the RAID directory within the compute node in [3].

Then, I expanded my testing to instances, and here's what I found:

1. I tested things out with the default settings and Ubuntu 22.04 LTS image. The iOPS results were hovering around 18-18.5k. Check out [4] and [5] for the specifics. 2. I tweaked the nova.conf file with two changes: force_raw_images = true and images_type = flat. Unfortunately, this only brought the iOPS down a bit, to just under 18k. You can see more in [6] and [7]. 3. I made an extra change in nova.conf by switching the cpu_model from SandyBridge to IvyBridge. This change dropped the iOPS further, to around 17k. Details are in [8] and [9]. 4. Lastly, I played around with image properties, setting hw_scsi_model=virtio-scsi and hw_disk_bus=scsi. However, this also resulted in around 17k iOPS. You can find out more in [10] and [11].

It's a bit disheartening that none of these changes seemed to have the impact I was aiming for. So, I'm starting to think there might be a crucial piece of the puzzle that I'm missing here. If you have any ideas or insights, I'd be incredibly grateful for your input.

Thanks once more for all your help and support.

/Jan Wasilewski

/References: / /[1] Disk details and raid details: https://paste.openstack.org/show/bRyLPZ6TDHpIEKadLC7z// /[2] Compute node and nova details: https://paste.openstack.org/show/bcGw3Glm6U0r1kUsg8nU// /[3] fio results executed in raid directory inside compute node: https://paste.openstack.org/show/bN0EkBjoAP2Ig5PSSfy3// /[4] dumpxml of instance from test 1: https://paste.openstack.org/show/bVSq8tz1bSMdiYXcF3IP// /[5] fio results from test 1: https://paste.openstack.org/show/bKlxom8Yl7NtHO8kO53a// /[6] dumpxml of instance from test 2: https://paste.openstack.org/show/bN2JN9DXT4DGKNZnzkJ8// /[7] fio results from test 2: https://paste.openstack.org/show/b7GXIVI2Cv0qkVLQaAF3// /[8] dumpxml of instance from test 3: https://paste.openstack.org/show/b0821V4IUq8N7YPb73sg// /[9] fio results from test 3: https://paste.openstack.org/show/bT1Erfxq4XTj0ubTTgdj// /[10] dumpxml of instance from test 4: https://paste.openstack.org/show/bjTXM0do1xgzmVZO02Q7// /[11] fio results from test 4: https://paste.openstack.org/show/bpbVJntkR5aNke3trtRd//

śr., 9 sie 2023 o 19:56 Damian Pietras <damian.pietras@hardit.pl> napisał(a):

I would suggest to:

- make sure that "none" I/O scheduler is used inside VM (e.g. /sys/block/sda/queue/scheduler). I assume quite recent kernel, otherwise "noop".

- make sure that host has CPU C-States above C1 disabled (check values of all /sys/devices/system/cpu/*/cpuidle/state*/disable for while [..]/name is different than "POLL", C1, C1E) or use some tool that disables that.

- Use raw images instead of qcow2: in [libvirt] section of nova.conf set force_raw_images=True and images_type=flat and recreate the instance

Is the difference so big also when you lower I/O depth (for example to 1) or increase block size (for example to 64k) ?

On 09/08/2023 10:02, Jan Wasilewski wrote:

...
Hi,

I am reaching out to inquire about the performance of our local storage setup. Currently, I am conducting tests using NVMe disks; however, the results appear to be underwhelming.

In terms of my setup, I have recently incorporated two NVMe disks into my compute node. These disks have been configured as RAID1 under md127 and subsequently mounted at /var/lib/nova/instances [1]. During benchmarking using the fio tool within this directory, I am achieving approximately 160,000 IOPS [2]. This figure serves as a satisfactory baseline and reference point for upcoming VM tests.

As the next phase, I have established a flavor that employs a root disk for my virtual machine [3]. Regrettably, the resulting performance yields around 18,000 IOPS, which is nearly ten times poorer than the compute node results [4]. While I expected some degradation, a tenfold decrease seems excessive. Realistically, I anticipated no more than a twofold reduction compared to the compute node's performance. Hence, I am led to ask: what should be configured to enhance performance?

I have already experimented with the settings recommended on the Ceph page for image properties [5]; however, these changes did not yield the desired improvements. In addition, I attempted to modify the CPU architecture within the nova.conf file, switching to Cascade Lake architecture, yet this endeavor also proved ineffective. For your convenience, I have included a link to my current dumpxml results [6].

Your insights and guidance would be greatly appreciated. I am confident that there is a solution to this performance disparity that I may have overlooked. Thank you in advance for your help.

/Jan Wasilewski

/References:/ /[1] nvme allocation and raid configuration: https://paste.openstack.org/show/bMMgGqu5I6LWuoQWV7TV// /[2] fio performance inside compute node: https://paste.openstack.org/show/bcMi4zG7QZwuJZX8nyct// /[3] Flavor configuration: https://paste.openstack.org/show/b7o9hCKilmJI3qyXsP5u// /[4] fio performance inside VM: https://paste.openstack.org/show/bUjqxfU4nEtSFqTlU8oH// /[5] image properties: https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties/ /[6] dumpxml of vm: https://paste.openstack.org/show/bRECcaSMqa8TlrPp0xrT//

-- Damian Pietras

-- Damian Pietras

Jan Wasilewski

11 Aug 11 Aug

1:08 a.m.

Hi, Thank you once again for your valuable suggestions. I conducted another round of tests with C-states disabled. I checked the BIOS settings and added the suggested line to the grub startup. After rebooting my compute node, I observed an improvement in performance, reaching around 20,000 IOPS. Although there was a modest performance boost, it wasn't as substantial as I had anticipated. Additionally, I configured a ramdisk to establish a baseline for comparison. The results were quite significant, with the ramdisk achieving approximately 72,000 IOPS [1] [2]. However, I had initially expected even higher figures. Regardless, such outcomes would be highly beneficial for my NVMe virtual machines. Nonetheless, I'm at a loss regarding potential further optimizations. I've explored some resources, such as those found at: https://docs.openstack.org/nova/rocky/user/flavors.html, which outline IO limits. However, I am under the impression that these limits might only restrict performance rather than enhancing it. Could you kindly confirm if my understanding is accurate? I extend my gratitude in advance for any forthcoming suggestions. It's possible that I might be searching in the wrong places for solutions. /Jan Wasilewski *References:* *[1] dumpxml config for ramdisk vm: https://paste.openstack.org/show/b7AgTZBjvSpWMioJzmoA/ <https://paste.openstack.org/show/b7AgTZBjvSpWMioJzmoA/>* *[2] fio results of vm where ramdisk is a main disk: https://paste.openstack.org/show/bdII56cavmVwNAIq3axQ/ <https://paste.openstack.org/show/bdII56cavmVwNAIq3axQ/>* czw., 10 sie 2023 o 14:05 Damian Pietras <damian.pietras@hardit.pl> napisał(a):

...

HI,

You wrote "/sys/devices/system/cpu/*/cpuidle/state*/disable output is 0 for all cpus". It means all C-states (power saving states) are _enabled_. This may cause lower and inconsistent results. I would repeat the test with deeper C-states disabled. I think simplest way to do that is to boot system (nova compute node) with "|intel_idle.max_cstate=1" added to kernel command line parameters. I had similar issues with I/O performance inside VMs (but with lower disk queue depth) and power saving / frequency scaling had greatest influence on the results and also caused variations in the results between test runs. If you are out of ideas you could also rule out disk / filesystem / RAID configuration influence by temporary mounting tmpfs in /var/lib/nova/instances so the instances will have RAM-backed volumes. You need enough RAM for that of course. |

On 10/08/2023 13:35, Jan Wasilewski wrote:

...
Hi,

I wanted to express my sincere gratitude for all the help and advice you've given me. I followed your suggestions and carried out a bunch of tests, but unfortunately, the performance boost I was hoping for hasn't materialized.

Let me break down the configurations I've tried and the results I've got. Just to give you some context, all my tests were done using two INTEL SSDPE2MD400G4 NVMe disks and Ubuntu 20.04LTS as the OS on the compute node. You can find all the nitty-gritty details in [1] and [2]. Additionally, I've shared the results of the fio tests directly executed on the RAID directory within the compute node in [3].

Then, I expanded my testing to instances, and here's what I found:

1. I tested things out with the default settings and Ubuntu 22.04 LTS image. The iOPS results were hovering around 18-18.5k. Check out [4] and [5] for the specifics. 2. I tweaked the nova.conf file with two changes: force_raw_images = true and images_type = flat. Unfortunately, this only brought the iOPS down a bit, to just under 18k. You can see more in [6] and [7]. 3. I made an extra change in nova.conf by switching the cpu_model from SandyBridge to IvyBridge. This change dropped the iOPS further, to around 17k. Details are in [8] and [9]. 4. Lastly, I played around with image properties, setting hw_scsi_model=virtio-scsi and hw_disk_bus=scsi. However, this also resulted in around 17k iOPS. You can find out more in [10] and [11].

It's a bit disheartening that none of these changes seemed to have the impact I was aiming for. So, I'm starting to think there might be a crucial piece of the puzzle that I'm missing here. If you have any ideas or insights, I'd be incredibly grateful for your input.

Thanks once more for all your help and support.

/Jan Wasilewski

/References: / /[1] Disk details and raid details: https://paste.openstack.org/show/bRyLPZ6TDHpIEKadLC7z// /[2] Compute node and nova details: https://paste.openstack.org/show/bcGw3Glm6U0r1kUsg8nU// /[3] fio results executed in raid directory inside compute node: https://paste.openstack.org/show/bN0EkBjoAP2Ig5PSSfy3// /[4] dumpxml of instance from test 1: https://paste.openstack.org/show/bVSq8tz1bSMdiYXcF3IP// /[5] fio results from test 1: https://paste.openstack.org/show/bKlxom8Yl7NtHO8kO53a// /[6] dumpxml of instance from test 2: https://paste.openstack.org/show/bN2JN9DXT4DGKNZnzkJ8// /[7] fio results from test 2: https://paste.openstack.org/show/b7GXIVI2Cv0qkVLQaAF3// /[8] dumpxml of instance from test 3: https://paste.openstack.org/show/b0821V4IUq8N7YPb73sg// /[9] fio results from test 3: https://paste.openstack.org/show/bT1Erfxq4XTj0ubTTgdj// /[10] dumpxml of instance from test 4: https://paste.openstack.org/show/bjTXM0do1xgzmVZO02Q7// /[11] fio results from test 4: https://paste.openstack.org/show/bpbVJntkR5aNke3trtRd//

śr., 9 sie 2023 o 19:56 Damian Pietras <damian.pietras@hardit.pl> napisał(a):

I would suggest to:

- make sure that "none" I/O scheduler is used inside VM (e.g. /sys/block/sda/queue/scheduler). I assume quite recent kernel, otherwise "noop".

- make sure that host has CPU C-States above C1 disabled (check values of all /sys/devices/system/cpu/*/cpuidle/state*/disable for while [..]/name is different than "POLL", C1, C1E) or use some tool that disables that.

- Use raw images instead of qcow2: in [libvirt] section of nova.conf set force_raw_images=True and images_type=flat and recreate the instance

Is the difference so big also when you lower I/O depth (for example to 1) or increase block size (for example to 64k) ?

On 09/08/2023 10:02, Jan Wasilewski wrote:

...
Hi,

I am reaching out to inquire about the performance of our local storage setup. Currently, I am conducting tests using NVMe disks; however, the results appear to be underwhelming.

In terms of my setup, I have recently incorporated two NVMe disks into my compute node. These disks have been configured as RAID1 under md127 and subsequently mounted at /var/lib/nova/instances [1]. During benchmarking using the fio tool within this directory, I am achieving approximately 160,000 IOPS [2]. This figure serves as a satisfactory baseline and reference point for upcoming VM tests.

As the next phase, I have established a flavor that employs a root disk for my virtual machine [3]. Regrettably, the resulting performance yields around 18,000 IOPS, which is nearly ten times poorer than the compute node results [4]. While I expected some degradation, a tenfold decrease seems excessive. Realistically, I anticipated no more than a twofold reduction compared to the compute node's performance. Hence, I am led to ask: what should be configured to enhance performance?

I have already experimented with the settings recommended on the Ceph page for image properties [5]; however, these changes did not yield the desired improvements. In addition, I attempted to modify the CPU architecture within the nova.conf file, switching to Cascade Lake architecture, yet this endeavor also proved ineffective. For your convenience, I have included a link to my current dumpxml results [6].

Your insights and guidance would be greatly appreciated. I am confident that there is a solution to this performance disparity that I may have overlooked. Thank you in advance for your help.

/Jan Wasilewski

/References:/ /[1] nvme allocation and raid configuration: https://paste.openstack.org/show/bMMgGqu5I6LWuoQWV7TV// /[2] fio performance inside compute node: https://paste.openstack.org/show/bcMi4zG7QZwuJZX8nyct// /[3] Flavor configuration: https://paste.openstack.org/show/b7o9hCKilmJI3qyXsP5u// /[4] fio performance inside VM: https://paste.openstack.org/show/bUjqxfU4nEtSFqTlU8oH// /[5] image properties:

https://docs.ceph.com/en/pacific/rbd/rbd-openstack/#image-properties/

...
/[6] dumpxml of vm: https://paste.openstack.org/show/bRECcaSMqa8TlrPp0xrT//

-- Damian Pietras

-- Damian Pietras

Sven Kieske

1:38 a.m.

as a last resort, what kernel is that ubuntu 20.04 running? I'd advise to use the HWE Kernel at least, maybe even test latest kernel.org LTS release. HTH -- Sven Kieske Senior Cloud Engineer Mail: kieske@osism.tech Web: https://osism.tech OSISM GmbH Teckstraße 62 / 70190 Stuttgart / Deutschland Geschäftsführer: Christian Berendt Unternehmenssitz: Stuttgart Amtsgericht: Stuttgart, HRB 756139

Jan Wasilewski

2:27 a.m.

Hi Sven, maybe you missed it, but kernel is provided in a link here [1]. In short: 5.4.0-155-generic. If something additional is needed, just let me know. /Jan Wasilewski *[1] https://paste.openstack.org/show/bcGw3Glm6U0r1kUsg8nU/ <https://paste.openstack.org/show/bcGw3Glm6U0r1kUsg8nU/>* pt., 11 sie 2023 o 10:48 Sven Kieske <kieske@osism.tech> napisał(a):

...

as a last resort, what kernel is that ubuntu 20.04 running?

I'd advise to use the HWE Kernel at least, maybe even test latest kernel.org LTS release.

HTH

-- Sven Kieske Senior Cloud Engineer

Mail: kieske@osism.tech Web: https://osism.tech

OSISM GmbH Teckstraße 62 / 70190 Stuttgart / Deutschland

Geschäftsführer: Christian Berendt Unternehmenssitz: Stuttgart Amtsgericht: Stuttgart, HRB 756139

Jan Wasilewski

14 Aug 14 Aug

5:37 a.m.

Hi, I was conducting another round of tests, which is not a complete solution for the OpenStack platform itself. However, it served as a clever method to assess real NVMe performance within a virtual machine (VM). I decided to attach a full NVMe disk to a VM as "vdb" and assess the performance there. Interestingly, I managed to achieve approximately 80,000 IOPS, signifying a significant improvement. Nevertheless, it's worth noting that this approach may not be directly applicable to my solution, as the root disk configured in the flavor must be labeled as "vda". Regardless, I wanted to present this as a reference to demonstrate that achieving higher IOPS is indeed possible. With collaborative efforts, perhaps similar results can be attained for "vda" disks in fully OpenStack-managed VMs. The disk was added using the following "virsh" command: "virsh attach-disk instance-000034ba /dev/nvme1n1p1 vdb". Additional results, as well as the "dumpxml" output for this VM, are presented in references [1] and [2]. While achieving 80,000 IOPS is satisfactory for me, I also conducted separate tests with a VM that was entirely managed by Libvirt, without involving OpenStack. The VM was set up using the following command: "virt-install --virt-type=kvm --name=local-ubuntu --vcpus=2 --memory=4096 --disk path=/var/lib/nova/instances/test/disk,format=qcow2 --import --network default --graphics none" In this case, the OS image used was identical to the one employed in my full OpenStack test. The procedure for attaching a "vdb" drive was replicated exactly as it was for my OpenStack VM. The outcome of these tests is quite surprising. I was able to achieve around 130,000 IOPS, despite the configuration being nearly identical. This discrepancy is perplexing and suggests that there might be an issue with the Nova component itself. Although this may be a bold assertion, it's a hypothesis I'm considering until further clarification is obtained. The configuration details for this specific VM, along with the results from "fio" tests, can be found in references [3] and [4]. If anyone possesses insights into how to achieve around 80,000 IOPS within a fully OpenStack-operated environment, I'm eager to receive such suggestions. My objective here is to bridge this gap, and I would greatly appreciate any guidance in this regard. /Jan Wasilewski *References:* *[1] dumpxml of OpenStack managed instance with "vdb" attached: https://paste.openstack.org/show/bQvGUIM3FSHIyA9JoThY/ <https://paste.openstack.org/show/bQvGUIM3FSHIyA9JoThY/>* *[2] fio results of OpenStack managed instance with "vdb" attached: https://paste.openstack.org/show/bViUpJTf7UYpsRyGCAt9/ <https://paste.openstack.org/show/bViUpJTf7UYpsRyGCAt9/>* *[3] dumpxml of Libvirt managed instance with "vdb" attached: https://paste.openstack.org/show/bGv8dT1l2QaTiAybYrJi/ <https://paste.openstack.org/show/bGv8dT1l2QaTiAybYrJi/>* *[4] fio results of Libvirt managed instance with "vdb" attached: https://paste.openstack.org/show/bOzYXkbco0oDfgaD0co8/ <https://paste.openstack.org/show/bOzYXkbco0oDfgaD0co8/>* *[5] xml configuration of vdb drive: https://paste.openstack.org/show/bAJ9MyEWEGOteeJnH5D8/ <https://paste.openstack.org/show/bAJ9MyEWEGOteeJnH5D8/>* pt., 11 sie 2023 o 11:27 Jan Wasilewski <finarffin@gmail.com> napisał(a):

...

Hi Sven,

maybe you missed it, but kernel is provided in a link here [1]. In short: 5.4.0-155-generic. If something additional is needed, just let me know. /Jan Wasilewski

*[1] https://paste.openstack.org/show/bcGw3Glm6U0r1kUsg8nU/ <https://paste.openstack.org/show/bcGw3Glm6U0r1kUsg8nU/>*

pt., 11 sie 2023 o 10:48 Sven Kieske <kieske@osism.tech> napisał(a):

...
as a last resort, what kernel is that ubuntu 20.04 running?

I'd advise to use the HWE Kernel at least, maybe even test latest kernel.org LTS release.

HTH

-- Sven Kieske Senior Cloud Engineer

Mail: kieske@osism.tech Web: https://osism.tech

OSISM GmbH Teckstraße 62 / 70190 Stuttgart / Deutschland

Geschäftsführer: Christian Berendt Unternehmenssitz: Stuttgart Amtsgericht: Stuttgart, HRB 756139

Damian Pietras

7:11 a.m.

The difference between "sda" and "vdb" is the disk controller: In case of first disk: <target dev='sda' bus='scsi'/> Disk added via virsh: <target dev='vdb' bus='virtio'/> You can set OS image properties to achieve this setup and then re-create the VM: hw_disk_bus=|virtio and remove property "|||hw_scsi_model"| | Now to update properties of existing images: https://docs.openstack.org/glance/latest/admin/manage-images.html You can read more about image properties here: https://docs.openstack.org/glance/latest/admin/useful-image-properties.html On 14.08.2023 14:37, Jan Wasilewski wrote:

...

Hi,

I was conducting another round of tests, which is not a complete solution for the OpenStack platform itself. However, it served as a clever method to assess real NVMe performance within a virtual machine (VM). I decided to attach a full NVMe disk to a VM as "vdb" and assess the performance there. Interestingly, I managed to achieve approximately 80,000 IOPS, signifying a significant improvement. Nevertheless, it's worth noting that this approach may not be directly applicable to my solution, as the root disk configured in the flavor must be labeled as "vda". Regardless, I wanted to present this as a reference to demonstrate that achieving higher IOPS is indeed possible. With collaborative efforts, perhaps similar results can be attained for "vda" disks in fully OpenStack-managed VMs. The disk was added using the following "virsh" command: "virsh attach-disk instance-000034ba /dev/nvme1n1p1 vdb". Additional results, as well as the "dumpxml" output for this VM, are presented in references [1] and [2].

While achieving 80,000 IOPS is satisfactory for me, I also conducted separate tests with a VM that was entirely managed by Libvirt, without involving OpenStack. The VM was set up using the following command: "virt-install --virt-type=kvm --name=local-ubuntu --vcpus=2 --memory=4096 --disk path=/var/lib/nova/instances/test/disk,format=qcow2 --import --network default --graphics none"

In this case, the OS image used was identical to the one employed in my full OpenStack test. The procedure for attaching a "vdb" drive was replicated exactly as it was for my OpenStack VM. The outcome of these tests is quite surprising. I was able to achieve around 130,000 IOPS, despite the configuration being nearly identical. This discrepancy is perplexing and suggests that there might be an issue with the Nova component itself. Although this may be a bold assertion, it's a hypothesis I'm considering until further clarification is obtained. The configuration details for this specific VM, along with the results from "fio" tests, can be found in references [3] and [4].

If anyone possesses insights into how to achieve around 80,000 IOPS within a fully OpenStack-operated environment, I'm eager to receive such suggestions. My objective here is to bridge this gap, and I would greatly appreciate any guidance in this regard.

/Jan Wasilewski

/References:/ /[1] dumpxml of OpenStack managed instance with "vdb" attached: https://paste.openstack.org/show/bQvGUIM3FSHIyA9JoThY// /[2] fio results of OpenStack managed instance with "vdb" attached: https://paste.openstack.org/show/bViUpJTf7UYpsRyGCAt9// /[3] dumpxml of Libvirt managed instance with "vdb" attached: https://paste.openstack.org/show/bGv8dT1l2QaTiAybYrJi// /[4] fio results of Libvirt managed instance with "vdb" attached: https://paste.openstack.org/show/bOzYXkbco0oDfgaD0co8// /[5] xml configuration of vdb drive: https://paste.openstack.org/show/bAJ9MyEWEGOteeJnH5D8//

pt., 11 sie 2023 o 11:27 Jan Wasilewski <finarffin@gmail.com> napisał(a):

Hi Sven,

maybe you missed it, but kernel is provided in a link here [1]. In short: 5.4.0-155-generic. If something additional is needed, just let me know. /Jan Wasilewski

/[1] https://paste.openstack.org/show/bcGw3Glm6U0r1kUsg8nU//

pt., 11 sie 2023 o 10:48 Sven Kieske <kieske@osism.tech> napisał(a):

as a last resort, what kernel is that ubuntu 20.04 running?

I'd advise to use the HWE Kernel at least, maybe even test latest kernel.org <http://kernel.org> LTS release.

HTH

-- Sven Kieske Senior Cloud Engineer

Mail: kieske@osism.tech Web: https://osism.tech

OSISM GmbH Teckstraße 62 / 70190 Stuttgart / Deutschland

Geschäftsführer: Christian Berendt Unternehmenssitz: Stuttgart Amtsgericht: Stuttgart, HRB 756139

Sven Kieske

8:29 a.m.

Hi, Am Montag, dem 14.08.2023 um 14:37 +0200 schrieb Jan Wasilewski:

...

*[2] fio results of OpenStack managed instance with "vdb" attached: https://paste.openstack.org/show/bViUpJTf7UYpsRyGCAt9/ <https://paste.openstack.org/show/bViUpJTf7UYpsRyGCAt9/>* *[3] dumpxml of Libvirt managed instance with "vdb" attached: https://paste.openstack.org/show/bGv8dT1l2QaTiAybYrJi/ <https://paste.openstack.org/show/bGv8dT1l2QaTiAybYrJi/>* *[4] fio results of Libvirt managed instance with "vdb" attached: https://paste.openstack.org/show/bOzYXkbco0oDfgaD0co8/ <https://paste.openstack.org/show/bOzYXkbco0oDfgaD0co8/>* *[5] xml configuration of vdb drive: https://paste.openstack.org/show/bAJ9MyEWEGOteeJnH5D8/ <https://paste.openstack.org/show/bAJ9MyEWEGOteeJnH5D8/>*

one difference I can see in the fio results, is that the openstack provided vm does a lot more context switches and has a different cpu usage profile in general: Openstack Instance: cpu : usr=27.16%, sys=62.24%, ctx=3246653, majf=0, minf=14 plain libvirt instance: cpu : usr=15.75%, sys=56.31%, ctx=2860657, majf=0, minf=15 this indicates, that some other workload is running there or work is scheduled at least in a different way then on the plain libvirt machine, one example to check might be the irq balancing on different cores, but I can't remember atm, if this is fixed already on this kernel release (iirc in the past you used to run the irq-balance daemon which got obsolete after kernel 4.19 according to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=926967 ) how many other vms are running on that openstack hypervisor? I hope the hypervisor is not oversubscribed? You can easily see this in a modern variant of "top" which reports stolen cpu cycles, if you got cpu steal your cpu is oversubscribed. depending on the deployment, you will of course also incur additional overhead from other openstack services - beginning with nova, which might account for the additional context switches on the hypervisor. In general 3 million context switches is not that much and should not impact performance by much, but it's still a noticeable difference between the two systems. are the cpu models on the hypervisors exactly the same? I can't tell it from the libvirt dumps, but I notice that certain cpu flags are explicitly set for the libvirt managed instance, which might affect the end result. What's more bothering is, that the libvirt provided VM has a total cpu usage of roundabout 70% whereas the openstack provided one is closer to 90%. this leads me to believe that either one of the following is true: - the hypervisor cpus differ in a meaningful way, performance wise. - the hypervisor is somehow oversubscribed / has more work to do for the openstack deployed server, which results in worse benchmarks/more cpu being burnt by constantly evicting the task from the lower level l1/l2 cpu caches. - the context switches eat up significant cpu performance on the openstack instance (least likely imho). what would be interesting to know would be if mq-deadline and multi queue are enabled in the plain libvirt machine (are libvirt and qemu versions the same as in the openstack deploment?). you can check this like it is described here: https://bugzilla.redhat.com/show_bug.cgi?id=1827722 But I don't see "num_queues" or "queues" mentioned anywhere, so I assume it's turned off. Enabling it could also boost your performance by a lot. Another thing to check - especially since I noticed the cpu differences - would be the numa layout of the hypervisor and how the VM is affected by it. -- Sven Kieske Senior Cloud Engineer Mail: kieske@osism.tech Web: https://osism.tech OSISM GmbH Teckstraße 62 / 70190 Stuttgart / Deutschland Geschäftsführer: Christian Berendt Unternehmenssitz: Stuttgart Amtsgericht: Stuttgart, HRB 756139

smooney＠redhat.com

15 Aug 15 Aug

7:21 a.m.

On Mon, 2023-08-14 at 17:29 +0200, Sven Kieske wrote:

...

Hi,

Am Montag, dem 14.08.2023 um 14:37 +0200 schrieb Jan Wasilewski:

...
*[2] fio results of OpenStack managed instance with "vdb" attached: https://paste.openstack.org/show/bViUpJTf7UYpsRyGCAt9/ <https://paste.openstack.org/show/bViUpJTf7UYpsRyGCAt9/>* *[3] dumpxml of Libvirt managed instance with "vdb" attached: https://paste.openstack.org/show/bGv8dT1l2QaTiAybYrJi/ <https://paste.openstack.org/show/bGv8dT1l2QaTiAybYrJi/>*

looking at this xml you attach the qcow file via ide and passthough the nvme dev directly via virtio-blk <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' discard='unmap'/> <source file='/var/lib/nova/instances/test/disk' index='1'/> <backingStore type='file' index='2'> <format type='raw'/> <source file='/var/lib/nova/instances/_base/78f03ab8f57b6e53f615f89f7ca212c729cb2f29'/> <backingStore/> </backingStore> <target dev='hda' bus='ide'/> <alias name='ide0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw'/> <source dev='/dev/nvme1n1p1' index='4'/> <backingStore/> <target dev='vdb' bus='virtio'/> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> that is not a fir comparison as ide will also bottleneck the performance you shoudl use the same bus for both.

...

...
*[4] fio results of Libvirt managed instance with "vdb" attached: https://paste.openstack.org/show/bOzYXkbco0oDfgaD0co8/ <https://paste.openstack.org/show/bOzYXkbco0oDfgaD0co8/>* *[5] xml configuration of vdb drive: https://paste.openstack.org/show/bAJ9MyEWEGOteeJnH5D8/ <https://paste.openstack.org/show/bAJ9MyEWEGOteeJnH5D8/>*

one difference I can see in the fio results, is that the openstack provided vm does a lot more context switches and has a different cpu usage profile in general:

Openstack Instance:

cpu : usr=27.16%, sys=62.24%, ctx=3246653, majf=0, minf=14

plain libvirt instance:

cpu : usr=15.75%, sys=56.31%, ctx=2860657, majf=0, minf=15

one thing this might be related is the libvirt created vm does not have the virtual performance monitoring unit enabled (vPMU). i added the ablity to turn that off a few relases ago https://specs.openstack.org/openstack/nova-specs/specs/train/implemented/lib... via a boolean image metadata key hw_pmu=True|False and a corresponding flavor extra spec hw:pmu=True|False so you coudl try disabling that and see if it helps with the context switching.

...

this indicates, that some other workload is running there or work is scheduled at least in a different way then on the plain libvirt machine, one example to check might be the irq balancing on different cores, but I can't remember atm, if this is fixed already on this kernel release (iirc in the past you used to run the irq-balance daemon which got obsolete after kernel 4.19 according to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=926967 )

how many other vms are running on that openstack hypervisor?

I hope the hypervisor is not oversubscribed? You can easily see this in a modern variant of "top" which reports stolen cpu cycles, if you got cpu steal your cpu is oversubscribed.

depending on the deployment, you will of course also incur additional overhead from other openstack services - beginning with nova, which might account for the additional context switches on the hypervisor.

In general 3 million context switches is not that much and should not impact performance by much, but it's still a noticeable difference between the two systems.

are the cpu models on the hypervisors exactly the same? I can't tell it from the libvirt dumps, but I notice that certain cpu flags are explicitly set for the libvirt managed instance, which might affect the end result.

What's more bothering is, that the libvirt provided VM has a total cpu usage of roundabout 70% whereas the openstack provided one is closer to 90%.

this leads me to believe that either one of the following is true:

- the hypervisor cpus differ in a meaningful way, performance wise. - the hypervisor is somehow oversubscribed / has more work to do for the openstack deployed server, which results in worse benchmarks/more cpu being burnt by constantly evicting the task from the lower level l1/l2 cpu caches. - the context switches eat up significant cpu performance on the openstack instance (least likely imho).

what would be interesting to know would be if mq-deadline and multi queue are enabled in the plain libvirt machine (are libvirt and qemu versions the same as in the openstack deploment?).

you can check this like it is described here:

https://bugzilla.redhat.com/show_bug.cgi?id=1827722

But I don't see "num_queues" or "queues" mentioned anywhere, so I assume it's turned off. Enabling it could also boost your performance by a lot.

we do not support multi queue for virtio blk or scsi in nova its on our todo list but not available in any current release. https://review.opendev.org/c/openstack/nova-specs/+/878066 the person that was propsoign this is nolonger working on openstack so if peopel are interest feel free to get involved. otherwise it will liely get enabled in a release or two when we find time to work on it.

...

Another thing to check - especially since I noticed the cpu differences - would be the numa layout of the hypervisor and how the VM is affected by it.

Jan Wasilewski

17 Aug 17 Aug

3:16 a.m.

Hi, First and foremost, I want to express my heartfelt gratitude for all the invaluable insights you've provided. I meticulously studied and conducted numerous tests based on your inputs. While I've managed to implement certain enhancements, I'd like to delve into those improvements in an upcoming section. For now, let me address your queries. Regarding the number of concurrent VMs operating on the OpenStack hypervisor: - Presently, there is a sole VM running on this compute node, occasionally there might be two instances. The compute node remains largely underutilized, primarily earmarked for my performance assessments. It's equipped with a 24-core Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz, alongside a MemTotal of 48988528 kB. Thus far, I haven't detected any red flags. Even during the execution of fio tests within my VMs, there is no discernible surge in load. To @smooney: In relation to ide and virtio, I undertook a secondary test, meticulously duplicating the attachment methodology, and the outcomes are akin. Please refer to [1] and [2]. Nevertheless, as per your recommendation, I explored hw_pmu; however, the outcomes remained consistent. Find the results with hw_pmu disabled in [3], [4], and [5], and contrasting results with hw_pmu enabled in [6], [7], and [8]. Nonetheless, I did experience a substantial performance escalation, albeit solely for a manually attached disk—a comprehensive drive, not the disk associated with the VM as a singular file [9]. The solitary alteration involved configuring my cpu_model in nova.conf from IvyBridge to Cascadelake-Server-noTSX. Even though I achieved approximately 110k iOPS for the fully attached disk [10], the file-attached disk retained around 19k iOPS [11], with comparable performance evident for the root disk [12]. The latter is also a solitary file, albeit located on a distinct drive of the same model. For your perusal, I've appended all relevant dumpxml data [13]. In summation, it seems that the cpu_model significantly influences performance enhancement, though this effect is not replicated for a "file disk." The query thus stands: how can we elevate performance for a file disk? Might you be willing to share the fio benchmark outcomes from your local storage configuration? I'm curious to ascertain whether our results align, or if there's a concealed optimization path I have yet to uncover. I sincerely appreciate all the assistance you've extended thus far. /Jan Wasilewski *References:* *[1] virtio connected via virsh attach-volume to Openstack instance(<80k iOPS): https://paste.openstack.org/show/bHqZZWdAwWVYh1rHaIgC/ <https://paste.openstack.org/show/bHqZZWdAwWVYh1rHaIgC/>* *[2] virtio connected via virsh attach-volume to Openstack instance dumpxml: https://paste.openstack.org/show/bvEsKiwBd8lL4AUPSOxj/ <https://paste.openstack.org/show/bvEsKiwBd8lL4AUPSOxj/>* *[3] hw_pmu: False: fio - root disk: https://paste.openstack.org/show/bAZXQOUrkmVBsJ7yBEql/ <https://paste.openstack.org/show/bAZXQOUrkmVBsJ7yBEql/>* *[4] hw_pmu: False: fio - attached nvme disk: https://paste.openstack.org/show/bF1P0qsVG24duuY8F6HV/ <https://paste.openstack.org/show/bF1P0qsVG24duuY8F6HV/>* *[5] hw_pmu: False: dumpxml: https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/ <https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/>* *[6] hw_pmu: True: fio - root disk: https://paste.openstack.org/show/b7jJ7gR2e9VAAXm1e9PP/ <https://paste.openstack.org/show/b7jJ7gR2e9VAAXm1e9PP/>* *[7] hw_pmu: True: fio - attached nvme disk(82,5k iOPS) : https://paste.openstack.org/show/bCrdOnwxrJS6hENxTMK5/ <https://paste.openstack.org/show/bCrdOnwxrJS6hENxTMK5/>* *[8] hw_pmu: True: dumpxml: https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/ <https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/>* *[9] Instruction how to add a "file disk" to kvm instance: https://www.cyberciti.biz/faq/how-to-add-disk-image-to-kvm-virtual-machine-w... <https://www.cyberciti.biz/faq/how-to-add-disk-image-to-kvm-virtual-machine-with-virsh-command/>* *[10] cpu_model: Cascadelake-Server-noTSX fio - attached nvme disk(almost 110k iOPS): https://paste.openstack.org/show/bdKQIgNIH0dy8PLhAIKq/ <https://paste.openstack.org/show/bdKQIgNIH0dy8PLhAIKq/>* *[11] cpu_model: Cascadelake-Server-noTSX fio - "file disk": https://paste.openstack.org/show/bjBmPBXi35jWdyJ1cjQt/ <https://paste.openstack.org/show/bjBmPBXi35jWdyJ1cjQt/>* *[12] cpu_model: Cascadelake-Server-noTSX fio - root disk: https://paste.openstack.org/show/br49T918vNU5NJXfXYGm/ <https://paste.openstack.org/show/br49T918vNU5NJXfXYGm/>* *[13] cpu_model: Cascadelake-Server-noTSX dumpxml: https://paste.openstack.org/show/bns2rWIHCHIWbrR9LUD0/ <https://paste.openstack.org/show/bns2rWIHCHIWbrR9LUD0/>*

Jan Wasilewski

21 Aug 21 Aug

6:06 a.m.

Hi, Let me add a few points. Lastly, I decided to conduct a couple of tests with the newer OpenStack platform - Zed (built by the kolla-ansible project). This platform serves Ubuntu 22.04 LTS on top of my compute nodes. The results were surprising, particularly because I was able to achieve the desired outcomes. My compute node was equipped with 2 SSDs and 2 NVMe disks. As a preliminary step, I used SSD drives for testing. The fio test yielded a result of approximately 90k IOPS for the local SSD drive [1], employing IvyBridge-IBRS as the cpu_model parameter. When I transitioned to Cascadelake-Server, I managed to exceed 100k IOPS [2]. Interestingly, when I conducted an identical test with NVMe drives, the performance was only slightly above 90k IOPS [3]. This suggests that NVMe drives are marginally slower than SSD drives for local storage when used by VMs. For the final test, I executed the fio test on the NVMe mounting point, achieving around 140k IOPS [4]. In summary, it appears that the choice of Ubuntu version as the base for compute nodes has a significant impact on performance (Ubuntu 20.04 LTS vs. Ubuntu 22.04 LTS). In my opinion, a kernel parameter seems to be responsible for constraining the performance within the VM (more precisely, the "drive file" serving as local storage for the VM). However, I'm uncertain about which specific parameter(s) are at play. I intend to delve deeper into this matter, but I'm open to any suggestions you may have. /Jan Wasilewski *References:* *[1] fio results for IvyBridge and SSDs: https://paste.openstack.org/show/bUCoXBUbImd9JxplPBbv/ <https://paste.openstack.org/show/bUCoXBUbImd9JxplPBbv/>* *[2] fio results for Cascadelake-Server and SSDs: https://paste.openstack.org/show/bWxDkM5ITcMTlFWe4GiZ/ <https://paste.openstack.org/show/bWxDkM5ITcMTlFWe4GiZ/>* *[3] fio results for Cascadelake-Server and NVMe: https://paste.openstack.org/show/bbINpvkNZcJcY0KP0vPo/ <https://paste.openstack.org/show/bbINpvkNZcJcY0KP0vPo/>* *[4] fio results for mounting point of NVMe: https://paste.openstack.org/show/bTchYOYY3zNpSLPfOpQl/ <https://paste.openstack.org/show/bTchYOYY3zNpSLPfOpQl/>* czw., 17 sie 2023 o 12:16 Jan Wasilewski <finarffin@gmail.com> napisał(a):

...

Hi,

First and foremost, I want to express my heartfelt gratitude for all the invaluable insights you've provided. I meticulously studied and conducted numerous tests based on your inputs. While I've managed to implement certain enhancements, I'd like to delve into those improvements in an upcoming section. For now, let me address your queries.

Regarding the number of concurrent VMs operating on the OpenStack hypervisor:

- Presently, there is a sole VM running on this compute node, occasionally there might be two instances. The compute node remains largely underutilized, primarily earmarked for my performance assessments. It's equipped with a 24-core Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz, alongside a MemTotal of 48988528 kB. Thus far, I haven't detected any red flags. Even during the execution of fio tests within my VMs, there is no discernible surge in load.

To @smooney: In relation to ide and virtio, I undertook a secondary test, meticulously duplicating the attachment methodology, and the outcomes are akin. Please refer to [1] and [2].

Nevertheless, as per your recommendation, I explored hw_pmu; however, the outcomes remained consistent. Find the results with hw_pmu disabled in [3], [4], and [5], and contrasting results with hw_pmu enabled in [6], [7], and [8].

Nonetheless, I did experience a substantial performance escalation, albeit solely for a manually attached disk—a comprehensive drive, not the disk associated with the VM as a singular file [9]. The solitary alteration involved configuring my cpu_model in nova.conf from IvyBridge to Cascadelake-Server-noTSX. Even though I achieved approximately 110k iOPS for the fully attached disk [10], the file-attached disk retained around 19k iOPS [11], with comparable performance evident for the root disk [12]. The latter is also a solitary file, albeit located on a distinct drive of the same model. For your perusal, I've appended all relevant dumpxml data [13]. In summation, it seems that the cpu_model significantly influences performance enhancement, though this effect is not replicated for a "file disk." The query thus stands: how can we elevate performance for a file disk?

Might you be willing to share the fio benchmark outcomes from your local storage configuration? I'm curious to ascertain whether our results align, or if there's a concealed optimization path I have yet to uncover. I sincerely appreciate all the assistance you've extended thus far. /Jan Wasilewski

*References:* *[1] virtio connected via virsh attach-volume to Openstack instance(<80k iOPS): https://paste.openstack.org/show/bHqZZWdAwWVYh1rHaIgC/ <https://paste.openstack.org/show/bHqZZWdAwWVYh1rHaIgC/>* *[2] virtio connected via virsh attach-volume to Openstack instance dumpxml: https://paste.openstack.org/show/bvEsKiwBd8lL4AUPSOxj/ <https://paste.openstack.org/show/bvEsKiwBd8lL4AUPSOxj/>* *[3] hw_pmu: False: fio - root disk: https://paste.openstack.org/show/bAZXQOUrkmVBsJ7yBEql/ <https://paste.openstack.org/show/bAZXQOUrkmVBsJ7yBEql/>* *[4] hw_pmu: False: fio - attached nvme disk: https://paste.openstack.org/show/bF1P0qsVG24duuY8F6HV/ <https://paste.openstack.org/show/bF1P0qsVG24duuY8F6HV/>* *[5] hw_pmu: False: dumpxml: https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/ <https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/>* *[6] hw_pmu: True: fio - root disk: https://paste.openstack.org/show/b7jJ7gR2e9VAAXm1e9PP/ <https://paste.openstack.org/show/b7jJ7gR2e9VAAXm1e9PP/>* *[7] hw_pmu: True: fio - attached nvme disk(82,5k iOPS) : https://paste.openstack.org/show/bCrdOnwxrJS6hENxTMK5/ <https://paste.openstack.org/show/bCrdOnwxrJS6hENxTMK5/>* *[8] hw_pmu: True: dumpxml: https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/ <https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/>* *[9] Instruction how to add a "file disk" to kvm instance: https://www.cyberciti.biz/faq/how-to-add-disk-image-to-kvm-virtual-machine-w... <https://www.cyberciti.biz/faq/how-to-add-disk-image-to-kvm-virtual-machine-with-virsh-command/>* *[10] cpu_model: Cascadelake-Server-noTSX fio - attached nvme disk(almost 110k iOPS): https://paste.openstack.org/show/bdKQIgNIH0dy8PLhAIKq/ <https://paste.openstack.org/show/bdKQIgNIH0dy8PLhAIKq/>* *[11] cpu_model: Cascadelake-Server-noTSX fio - "file disk": https://paste.openstack.org/show/bjBmPBXi35jWdyJ1cjQt/ <https://paste.openstack.org/show/bjBmPBXi35jWdyJ1cjQt/>* *[12] cpu_model: Cascadelake-Server-noTSX fio - root disk: https://paste.openstack.org/show/br49T918vNU5NJXfXYGm/ <https://paste.openstack.org/show/br49T918vNU5NJXfXYGm/>* *[13] cpu_model: Cascadelake-Server-noTSX dumpxml: https://paste.openstack.org/show/bns2rWIHCHIWbrR9LUD0/ <https://paste.openstack.org/show/bns2rWIHCHIWbrR9LUD0/>*

smooney＠redhat.com

6:56 a.m.

...

Hi,

Let me add a few points. Lastly, I decided to conduct a couple of tests with the newer OpenStack platform - Zed (built by the kolla-ansible project). This platform serves Ubuntu 22.04 LTS on top of my compute nodes. The results were surprising, particularly because I was able to achieve the desired outcomes.

My compute node was equipped with 2 SSDs and 2 NVMe disks. As a preliminary step, I used SSD drives for testing. The fio test yielded a result of approximately 90k IOPS for the local SSD drive [1], employing IvyBridge-IBRS as the cpu_model parameter. When I transitioned to Cascadelake-Server, I managed to exceed 100k IOPS [2]. Interestingly, when I conducted an identical test with NVMe drives, the performance was only slightly above 90k IOPS [3]. This suggests that NVMe drives are marginally slower than SSD drives for local storage when used by VMs.

For the final test, I executed the fio test on the NVMe mounting point, achieving around 140k IOPS [4].

In summary, it appears that the choice of Ubuntu version as the base for compute nodes has a significant impact on performance (Ubuntu 20.04 LTS vs. Ubuntu 22.04 LTS). In my opinion, a kernel parameter seems to be responsible for constraining the performance within the VM (more precisely, the "drive file" serving as local storage for the VM). However, I'm uncertain about which specific parameter(s) are at play. I intend to delve deeper into this matter, but I'm open to any suggestions you may have.

On Mon, 2023-08-21 at 15:06 +0200, Jan Wasilewski wrote: thanks for reporting your observation. this may or may not be kernel related if you are using diffent version fo QEMU between each ubuntu release. if its the same version then this may indeed be related to kernel change but it may not be with parmater. rather it could be with change to the filesystem that may have improved performance for vm workloads. it could also be related to enhancements with some of the kernel mitigation that are used or a number of other factors. 20.04 to 22.04 is a large leap and there are alot of changes even if you are deploying the same version of openstack using package form the cloud archive on 20.04, if you want to get the highest possible performance in the guest instead of setting a virutral cpu model you should set [libvirt] cpu_mode=host-passthrough instead of [libvirt] cpu_mode=custom cpu_models=Cascadelake-Server the down side to using host-passthrough is you will only be able to live migrate to servers with the exact same model of cpu. if all your cpus are the same or you can sub devied your cloud into sets of host with the same cpu sku i.e. via host aggrates and filters/traits then that's not really an issue. if you do find a kernel parmater to acchive the same performacne on 20.04 please let us know but i suspect its a combination of things that have change between both releases rhater then a single thing.

...

/Jan Wasilewski *References:* *[1] fio results for IvyBridge and SSDs: https://paste.openstack.org/show/bUCoXBUbImd9JxplPBbv/ <https://paste.openstack.org/show/bUCoXBUbImd9JxplPBbv/>* *[2] fio results for Cascadelake-Server and SSDs: https://paste.openstack.org/show/bWxDkM5ITcMTlFWe4GiZ/ <https://paste.openstack.org/show/bWxDkM5ITcMTlFWe4GiZ/>* *[3] fio results for Cascadelake-Server and NVMe: https://paste.openstack.org/show/bbINpvkNZcJcY0KP0vPo/ <https://paste.openstack.org/show/bbINpvkNZcJcY0KP0vPo/>* *[4] fio results for mounting point of NVMe: https://paste.openstack.org/show/bTchYOYY3zNpSLPfOpQl/ <https://paste.openstack.org/show/bTchYOYY3zNpSLPfOpQl/>*

czw., 17 sie 2023 o 12:16 Jan Wasilewski <finarffin@gmail.com> napisał(a):

...
Hi,

First and foremost, I want to express my heartfelt gratitude for all the invaluable insights you've provided. I meticulously studied and conducted numerous tests based on your inputs. While I've managed to implement certain enhancements, I'd like to delve into those improvements in an upcoming section. For now, let me address your queries.

Regarding the number of concurrent VMs operating on the OpenStack hypervisor:

- Presently, there is a sole VM running on this compute node, occasionally there might be two instances. The compute node remains largely underutilized, primarily earmarked for my performance assessments. It's equipped with a 24-core Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz, alongside a MemTotal of 48988528 kB. Thus far, I haven't detected any red flags. Even during the execution of fio tests within my VMs, there is no discernible surge in load.

To @smooney: In relation to ide and virtio, I undertook a secondary test, meticulously duplicating the attachment methodology, and the outcomes are akin. Please refer to [1] and [2].

Nevertheless, as per your recommendation, I explored hw_pmu; however, the outcomes remained consistent. Find the results with hw_pmu disabled in [3], [4], and [5], and contrasting results with hw_pmu enabled in [6], [7], and [8].

Nonetheless, I did experience a substantial performance escalation, albeit solely for a manually attached disk—a comprehensive drive, not the disk associated with the VM as a singular file [9]. The solitary alteration involved configuring my cpu_model in nova.conf from IvyBridge to Cascadelake-Server-noTSX. Even though I achieved approximately 110k iOPS for the fully attached disk [10], the file-attached disk retained around 19k iOPS [11], with comparable performance evident for the root disk [12]. The latter is also a solitary file, albeit located on a distinct drive of the same model. For your perusal, I've appended all relevant dumpxml data [13]. In summation, it seems that the cpu_model significantly influences performance enhancement, though this effect is not replicated for a "file disk." The query thus stands: how can we elevate performance for a file disk?

Might you be willing to share the fio benchmark outcomes from your local storage configuration? I'm curious to ascertain whether our results align, or if there's a concealed optimization path I have yet to uncover. I sincerely appreciate all the assistance you've extended thus far. /Jan Wasilewski

*References:* *[1] virtio connected via virsh attach-volume to Openstack instance(<80k iOPS): https://paste.openstack.org/show/bHqZZWdAwWVYh1rHaIgC/ <https://paste.openstack.org/show/bHqZZWdAwWVYh1rHaIgC/>* *[2] virtio connected via virsh attach-volume to Openstack instance dumpxml: https://paste.openstack.org/show/bvEsKiwBd8lL4AUPSOxj/ <https://paste.openstack.org/show/bvEsKiwBd8lL4AUPSOxj/>* *[3] hw_pmu: False: fio - root disk: https://paste.openstack.org/show/bAZXQOUrkmVBsJ7yBEql/ <https://paste.openstack.org/show/bAZXQOUrkmVBsJ7yBEql/>* *[4] hw_pmu: False: fio - attached nvme disk: https://paste.openstack.org/show/bF1P0qsVG24duuY8F6HV/ <https://paste.openstack.org/show/bF1P0qsVG24duuY8F6HV/>* *[5] hw_pmu: False: dumpxml: https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/ <https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/>* *[6] hw_pmu: True: fio - root disk: https://paste.openstack.org/show/b7jJ7gR2e9VAAXm1e9PP/ <https://paste.openstack.org/show/b7jJ7gR2e9VAAXm1e9PP/>* *[7] hw_pmu: True: fio - attached nvme disk(82,5k iOPS) : https://paste.openstack.org/show/bCrdOnwxrJS6hENxTMK5/ <https://paste.openstack.org/show/bCrdOnwxrJS6hENxTMK5/>* *[8] hw_pmu: True: dumpxml: https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/ <https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/>* *[9] Instruction how to add a "file disk" to kvm instance: https://www.cyberciti.biz/faq/how-to-add-disk-image-to-kvm-virtual-machine-w... <https://www.cyberciti.biz/faq/how-to-add-disk-image-to-kvm-virtual-machine-with-virsh-command/>* *[10] cpu_model: Cascadelake-Server-noTSX fio - attached nvme disk(almost 110k iOPS): https://paste.openstack.org/show/bdKQIgNIH0dy8PLhAIKq/ <https://paste.openstack.org/show/bdKQIgNIH0dy8PLhAIKq/>* *[11] cpu_model: Cascadelake-Server-noTSX fio - "file disk": https://paste.openstack.org/show/bjBmPBXi35jWdyJ1cjQt/ <https://paste.openstack.org/show/bjBmPBXi35jWdyJ1cjQt/>* *[12] cpu_model: Cascadelake-Server-noTSX fio - root disk: https://paste.openstack.org/show/br49T918vNU5NJXfXYGm/ <https://paste.openstack.org/show/br49T918vNU5NJXfXYGm/>* *[13] cpu_model: Cascadelake-Server-noTSX dumpxml: https://paste.openstack.org/show/bns2rWIHCHIWbrR9LUD0/ <https://paste.openstack.org/show/bns2rWIHCHIWbrR9LUD0/>*

Satish Patel

2:18 p.m.

Hi Jan, Just curious after reading valuable inputs from others and you, In what OS are you seeing performance degradation Ubuntu 20.04 LTS or Ubuntu 22.04 LTS? Soon I am going to build some compute nodes using NvME and am looking for the right OS/kernel combo for better performance. On Mon, Aug 21, 2023 at 9:59 AM <smooney@redhat.com> wrote:

...

...
Hi,

Let me add a few points. Lastly, I decided to conduct a couple of tests with the newer OpenStack platform - Zed (built by the kolla-ansible project). This platform serves Ubuntu 22.04 LTS on top of my compute nodes. The results were surprising, particularly because I was able to achieve

...
desired outcomes.

My compute node was equipped with 2 SSDs and 2 NVMe disks. As a

...
step, I used SSD drives for testing. The fio test yielded a result of approximately 90k IOPS for the local SSD drive [1], employing IvyBridge-IBRS as the cpu_model parameter. When I transitioned to Cascadelake-Server, I managed to exceed 100k IOPS [2]. Interestingly, when I conducted an identical test with NVMe drives, the performance was only slightly above 90k IOPS [3]. This suggests that NVMe drives are marginally slower than SSD drives for local storage when used by VMs.

For the final test, I executed the fio test on the NVMe mounting point, achieving around 140k IOPS [4].

In summary, it appears that the choice of Ubuntu version as the base for compute nodes has a significant impact on performance (Ubuntu 20.04 LTS vs. Ubuntu 22.04 LTS). In my opinion, a kernel parameter seems to be responsible for constraining the performance within the VM (more

...
the "drive file" serving as local storage for the VM). However, I'm uncertain about which specific parameter(s) are at play. I intend to delve deeper into this matter, but I'm open to any suggestions you may have.

On Mon, 2023-08-21 at 15:06 +0200, Jan Wasilewski wrote: the preliminary precisely, thanks for reporting your observation. this may or may not be kernel related if you are using diffent version fo QEMU between each ubuntu release. if its the same version then this may indeed be related to kernel change but it may not be with parmater. rather it could be with change to the filesystem that may have improved performance for vm workloads. it could also be related to enhancements with some of the kernel mitigation that are used or a number of other factors. 20.04 to 22.04 is a large leap and there are alot of changes even if you are deploying the same version of openstack using package form the cloud archive on 20.04,

if you want to get the highest possible performance in the guest instead of setting a virutral cpu model you should set [libvirt] cpu_mode=host-passthrough

instead of [libvirt] cpu_mode=custom cpu_models=Cascadelake-Server

the down side to using host-passthrough is you will only be able to live migrate to servers with the exact same model of cpu. if all your cpus are the same or you can sub devied your cloud into sets of host with the same cpu sku i.e. via host aggrates and filters/traits then that's not really an issue.

if you do find a kernel parmater to acchive the same performacne on 20.04 please let us know but i suspect its a combination of things that have change between both releases rhater then a single thing.

...
/Jan Wasilewski *References:* *[1] fio results for IvyBridge and SSDs: https://paste.openstack.org/show/bUCoXBUbImd9JxplPBbv/ <https://paste.openstack.org/show/bUCoXBUbImd9JxplPBbv/>* *[2] fio results for Cascadelake-Server and SSDs: https://paste.openstack.org/show/bWxDkM5ITcMTlFWe4GiZ/ <https://paste.openstack.org/show/bWxDkM5ITcMTlFWe4GiZ/>* *[3] fio results for Cascadelake-Server and NVMe: https://paste.openstack.org/show/bbINpvkNZcJcY0KP0vPo/ <https://paste.openstack.org/show/bbINpvkNZcJcY0KP0vPo/>* *[4] fio results for mounting point of NVMe: https://paste.openstack.org/show/bTchYOYY3zNpSLPfOpQl/ <https://paste.openstack.org/show/bTchYOYY3zNpSLPfOpQl/>*

czw., 17 sie 2023 o 12:16 Jan Wasilewski <finarffin@gmail.com>

napisał(a):

...
...
Hi,

First and foremost, I want to express my heartfelt gratitude for all

...
...
invaluable insights you've provided. I meticulously studied and conducted numerous tests based on your inputs. While I've managed to implement certain enhancements, I'd like to delve into those improvements in an upcoming section. For now, let me address your queries.

Regarding the number of concurrent VMs operating on the OpenStack hypervisor:

- Presently, there is a sole VM running on this compute node, occasionally there might be two instances. The compute node remains largely underutilized, primarily earmarked for my performance assessments. It's equipped with a 24-core Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz, alongside a MemTotal of 48988528 kB. Thus far, I haven't detected any red flags. Even during the execution of fio tests within my VMs, there is no discernible surge in load.

To @smooney: In relation to ide and virtio, I undertook a secondary test, meticulously duplicating the attachment methodology, and the outcomes are akin. Please refer to [1] and [2].

Nevertheless, as per your recommendation, I explored hw_pmu; however,

the the

...
...
outcomes remained consistent. Find the results with hw_pmu disabled in [3], [4], and [5], and contrasting results with hw_pmu enabled in [6], [7], and [8].

Nonetheless, I did experience a substantial performance escalation, albeit solely for a manually attached disk—a comprehensive drive, not the disk associated with the VM as a singular file [9]. The solitary alteration involved configuring my cpu_model in nova.conf from IvyBridge to Cascadelake-Server-noTSX. Even though I achieved approximately 110k iOPS for the fully attached disk [10], the file-attached disk retained around 19k iOPS [11], with comparable performance evident for the root disk [12]. The latter is also a solitary file, albeit located on a distinct drive of the same model. For your perusal, I've appended all relevant dumpxml data [13]. In summation, it seems that the cpu_model significantly influences performance enhancement, though this effect is not replicated for a "file disk." The query thus stands: how can we elevate performance for a file disk?

Might you be willing to share the fio benchmark outcomes from your local storage configuration? I'm curious to ascertain whether our results align, or if there's a concealed optimization path I have yet to uncover. I sincerely appreciate all the assistance you've extended thus far. /Jan Wasilewski

*References:* *[1] virtio connected via virsh attach-volume to Openstack instance(<80k iOPS): https://paste.openstack.org/show/bHqZZWdAwWVYh1rHaIgC/ <https://paste.openstack.org/show/bHqZZWdAwWVYh1rHaIgC/>* *[2] virtio connected via virsh attach-volume to Openstack instance dumpxml: https://paste.openstack.org/show/bvEsKiwBd8lL4AUPSOxj/ <https://paste.openstack.org/show/bvEsKiwBd8lL4AUPSOxj/>* *[3] hw_pmu: False: fio - root disk: https://paste.openstack.org/show/bAZXQOUrkmVBsJ7yBEql/ <https://paste.openstack.org/show/bAZXQOUrkmVBsJ7yBEql/>* *[4] hw_pmu: False: fio - attached nvme disk: https://paste.openstack.org/show/bF1P0qsVG24duuY8F6HV/ <https://paste.openstack.org/show/bF1P0qsVG24duuY8F6HV/>* *[5] hw_pmu: False: dumpxml: https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/ <https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/>* *[6] hw_pmu: True: fio - root disk: https://paste.openstack.org/show/b7jJ7gR2e9VAAXm1e9PP/ <https://paste.openstack.org/show/b7jJ7gR2e9VAAXm1e9PP/>* *[7] hw_pmu: True: fio - attached nvme disk(82,5k iOPS) : https://paste.openstack.org/show/bCrdOnwxrJS6hENxTMK5/ <https://paste.openstack.org/show/bCrdOnwxrJS6hENxTMK5/>* *[8] hw_pmu: True: dumpxml: https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/ <https://paste.openstack.org/show/b8Yxf5DmPmAxxA070DL1/>* *[9] Instruction how to add a "file disk" to kvm instance:

...
< https://www.cyberciti.biz/faq/how-to-add-disk-image-to-kvm-virtual-machine-w...

https://www.cyberciti.biz/faq/how-to-add-disk-image-to-kvm-virtual-machine-w... *

...
*[10] cpu_model: Cascadelake-Server-noTSX fio - attached nvme disk(almost 110k iOPS): https://paste.openstack.org/show/bdKQIgNIH0dy8PLhAIKq/ <https://paste.openstack.org/show/bdKQIgNIH0dy8PLhAIKq/>* *[11] cpu_model: Cascadelake-Server-noTSX fio - "file disk": https://paste.openstack.org/show/bjBmPBXi35jWdyJ1cjQt/ <https://paste.openstack.org/show/bjBmPBXi35jWdyJ1cjQt/>* *[12] cpu_model: Cascadelake-Server-noTSX fio - root disk: https://paste.openstack.org/show/br49T918vNU5NJXfXYGm/ <https://paste.openstack.org/show/br49T918vNU5NJXfXYGm/>* *[13] cpu_model: Cascadelake-Server-noTSX dumpxml: https://paste.openstack.org/show/bns2rWIHCHIWbrR9LUD0/ <https://paste.openstack.org/show/bns2rWIHCHIWbrR9LUD0/>*

Sven Kieske

22 Aug 22 Aug

3:01 a.m.

Am Montag, dem 21.08.2023 um 15:06 +0200 schrieb Jan Wasilewski:

...

Hi,

Let me add a few points. Lastly, I decided to conduct a couple of tests with the newer OpenStack platform - Zed (built by the kolla-ansible project). This platform serves Ubuntu 22.04 LTS on top of my compute nodes. The results were surprising, particularly because I was able to achieve the desired outcomes.

Hi, my bad. I asked two times about your kernel versions, but I failed to state _why_ and I guess we can see here why I asked: newer kernels - in general - have improved performance, especially when it comes to relatively new hardware like NVMe SSD. Your Ubuntu 22.04 should be on a 5.19 based kernel which saw a lot of improvements in the nvme and block area[0]. This is why I asked to run latest stable kernel.org LTS Kernel, to rule out kernel bottlenecks which can often be solved by upgrading. [0]: https://lore.kernel.org/linux-block/83ce22b4-bae1-9c97-1ad5-10835d6c5424@ker... HTH -- Sven Kieske Senior Cloud Engineer Mail: kieske@osism.tech Web: https://osism.tech OSISM GmbH Teckstraße 62 / 70190 Stuttgart / Deutschland Geschäftsführer: Christian Berendt Unternehmenssitz: Stuttgart Amtsgericht: Stuttgart, HRB 756139

697

Age (days ago)

710

Last active (days ago)

List overview

Download

18 comments

5 participants

participants (5)

Damian Pietras
Jan Wasilewski
Satish Patel
smooney＠redhat.com
Sven Kieske

[nova] Slow nvme performance for local storage instances

tags

participants (5)