[cinder][nova] Local storage in compute node

older
[ops] [keystone] "Roles are not...

Eric K. Miller

5 Aug 2020 5 Aug '20

2:41 a.m.

Hi, I'm research methods to get around high storage latency for some applications where redundancy does not matter, so using local NVMe drives in compute nodes seems to be the practical choice. However, there does not appear to be a good solution from what I have read. For example, BlockDeviceDriver has been deprecated/removed, LVM is only supported via iSCSI (which is slow) and localization of LVM volumes onto the same compute node as VMs is impossible, and other methods (PCI pass-through, etc.) would require direct access to the local drives, where device cleansing would need to occur after a device was removed from a VM, and I don't believe there is a hook for this. Ephemeral storage appears to be an option, but I believe it has the same issue as PCI pass-through, in that there is no abiilty to automatically cleanse a device after it has been used. In our default configuration, ephemeral storage is redirected to use Ceph, which solves the cleansing issue, but isn't suitable due to its high latency. Also, ephemeral storage appears as a second device, not the root disk, so that complicates a few configurations we have. Is there any other way to write an operating system image onto a local drive and boot from it? Or preferably assign an LVM /dev/mapper path as a device in libvirt (no iSCSI) after configuring a logical volume? or am I missing something? Thanks! Eric

Attachments:

attachment.html (text/html — 3.2 KB)

Show replies by date

Eric K. Miller

5 Aug 5 Aug

3:03 a.m.

In case this is the answer, I found that in nova.conf, under the [libvirt] stanza, images_type can be set to "lvm". This looks like it may do the trick - using the compute node's LVM to provision and mount a logical volume, for either persistent or ephemeral storage defined in the flavor. Can anyone validate that this is the right approach according to our needs? Also, I have read about the LVM device filters - which is important to avoid the host's LVM from seeing the guest's volumes, in case anyone else finds this message. Thanks! Eric

Lee Yarwood

4:19 a.m.

On 05-08-20 05:03:29, Eric K. Miller wrote:

...

In case this is the answer, I found that in nova.conf, under the [libvirt] stanza, images_type can be set to "lvm". This looks like it may do the trick - using the compute node's LVM to provision and mount a logical volume, for either persistent or ephemeral storage defined in the flavor.

Can anyone validate that this is the right approach according to our needs?

I'm not sure if it is given your initial requirements. Do you need full host block devices to be provided to the instance? The LVM imagebackend will just provision LVs on top of the provided VG so there's no direct mapping to a full host block device with this approach. That said there's no real alternative available at the moment.

...

Also, I have read about the LVM device filters - which is important to avoid the host's LVM from seeing the guest's volumes, in case anyone else finds this message.

Yeah that's a common pitfall when using LVM based ephemeral disks that contain additional LVM PVs/VGs/LVs etc. You need to ensure that the host is configured to not scan these LVs in order for their PVs/VGs/LVs etc to remain hidden from the host: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/htm... -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76

Sean Mooney

4:40 a.m.

...

On 05-08-20 05:03:29, Eric K. Miller wrote:

...
In case this is the answer, I found that in nova.conf, under the [libvirt] stanza, images_type can be set to "lvm". This looks like it may do the trick - using the compute node's LVM to provision and mount a logical volume, for either persistent or ephemeral storage defined in the flavor.

Can anyone validate that this is the right approach according to our needs?

I'm not sure if it is given your initial requirements.

Do you need full host block devices to be provided to the instance?

The LVM imagebackend will just provision LVs on top of the provided VG so there's no direct mapping to a full host block device with this approach.

That said there's no real alternative available at the moment. well one alternitive to nova providing local lvm storage is to use

On Wed, 2020-08-05 at 12:19 +0100, Lee Yarwood wrote: the cinder lvm driver but install it on all compute nodes then use the cidner InstanceLocalityFilter to ensure the volume is alocated form the host the vm is on. https://docs.openstack.org/cinder/latest/configuration/block-storage/schedul... on drawback to this is that if the if the vm is moved i think you would need to also migrate the cinder volume seperatly afterwards.

...

...
Also, I have read about the LVM device filters - which is important to avoid the host's LVM from seeing the guest's volumes, in case anyone else finds this message.

Yeah that's a common pitfall when using LVM based ephemeral disks that contain additional LVM PVs/VGs/LVs etc. You need to ensure that the host is configured to not scan these LVs in order for their PVs/VGs/LVs etc to remain hidden from the host:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/htm...

...

Sean Mooney

4:45 a.m.

On Wed, 2020-08-05 at 12:40 +0100, Sean Mooney wrote:

...

On Wed, 2020-08-05 at 12:19 +0100, Lee Yarwood wrote:

...
On 05-08-20 05:03:29, Eric K. Miller wrote:

...
In case this is the answer, I found that in nova.conf, under the [libvirt] stanza, images_type can be set to "lvm". This looks like it may do the trick - using the compute node's LVM to provision and mount a logical volume, for either persistent or ephemeral storage defined in the flavor.

Can anyone validate that this is the right approach according to our needs?

I'm not sure if it is given your initial requirements.

Do you need full host block devices to be provided to the instance?

The LVM imagebackend will just provision LVs on top of the provided VG so there's no direct mapping to a full host block device with this approach.

That said there's no real alternative available at the moment.

well one alternitive to nova providing local lvm storage is to use the cinder lvm driver but install it on all compute nodes then use the cidner InstanceLocalityFilter to ensure the volume is alocated form the host the vm is on. https://docs.openstack.org/cinder/latest/configuration/block-storage/schedul... on drawback to this is that if the if the vm is moved i think you would need to also migrate the cinder volume seperatly afterwards. by the way if you were to take this approch i think there is an nvmeof driver so you can use nvme over rdma instead of iscsi.

...
...
Also, I have read about the LVM device filters - which is important to avoid the host's LVM from seeing the guest's volumes, in case anyone else finds this message.

Yeah that's a common pitfall when using LVM based ephemeral disks that contain additional LVM PVs/VGs/LVs etc. You need to ensure that the host is configured to not scan these LVs in order for their PVs/VGs/LVs etc to remain hidden from the host:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/htm...

...

...

Donny Davis

5:36 a.m.

I use local nvme to drive the CI workload for the openstack community for the last year or so. It seems to work pretty well. I just created a filesystem (xfs) and mounted it to /var/lib/nova/instances I moved glance to using my swift backend and it really made the download of the images much faster. It depends on if the workload is going to handle HA or you are expecting to migrate machines. If the workload is ephemeral or HA can be handled in the app I think local storage is still a very viable option. Simpler is better IMO On Wed, Aug 5, 2020 at 7:48 AM Sean Mooney <smooney@redhat.com> wrote:

...

...
On Wed, 2020-08-05 at 12:19 +0100, Lee Yarwood wrote:

...
On 05-08-20 05:03:29, Eric K. Miller wrote:

...
In case this is the answer, I found that in nova.conf, under the [libvirt] stanza, images_type can be set to "lvm". This looks like it may do the trick - using the compute node's LVM to provision and mount a logical volume, for either persistent or ephemeral storage defined in the flavor.

Can anyone validate that this is the right approach according to our needs?

I'm not sure if it is given your initial requirements.

Do you need full host block devices to be provided to the instance?

The LVM imagebackend will just provision LVs on top of the provided VG so there's no direct mapping to a full host block device with this approach.

That said there's no real alternative available at the moment.

well one alternitive to nova providing local lvm storage is to use the cinder lvm driver but install it on all compute nodes then use the cidner InstanceLocalityFilter to ensure the volume is alocated

On Wed, 2020-08-05 at 12:40 +0100, Sean Mooney wrote: form the host

...
the vm is on.

https://docs.openstack.org/cinder/latest/configuration/block-storage/schedul...

...
on drawback to this is that if the if the vm is moved i think you would need to also migrate the cinder volume seperatly afterwards. by the way if you were to take this approch i think there is an nvmeof driver so you can use nvme over rdma instead of iscsi.

...
...
Also, I have read about the LVM device filters - which is important

to

...
...
avoid the host's LVM from seeing the guest's volumes, in case anyone else finds this message.

Yeah that's a common pitfall when using LVM based ephemeral disks that contain additional LVM PVs/VGs/LVs etc. You need to ensure that the host is configured to not scan these LVs in order for their PVs/VGs/LVs etc to remain hidden from the host:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/htm...

...
...

-- ~/DonnyD C: 805 814 6800 "No mission too difficult. No sacrifice too great. Duty First"

Sean Mooney

6:01 a.m.

...

I use local nvme to drive the CI workload for the openstack community for the last year or so. It seems to work pretty well. I just created a filesystem (xfs) and mounted it to /var/lib/nova/instances I moved glance to using my swift backend and it really made the download of the images much faster.

It depends on if the workload is going to handle HA or you are expecting to migrate machines. If the workload is ephemeral or HA can be handled in the app I think local storage is still a very viable option.

Simpler is better IMO yes that works well with the default flat/qcow file format i assume there was a reason this was not the starting point.

On Wed, 2020-08-05 at 08:36 -0400, Donny Davis wrote: the nova lvm backend i think does not supprot thin provisioning so fi you did the same thing creating the volume group on the nvme deivce you would technically get better write performance after the vm is booted but the vm spwan is slower since we cant take advantage of thin providioning and each root disk need to be copided form the cahced image. so just monting the nova data directory on an nvme driver or a raid of nvme drives works well and is simple to do. i take a slightly more complex approach from my home cluster wehre i put the nova data directory on a bcache block device which puts an nvme pci ssd as a cache infront of my raid 10 fo HDDs to acclerate it. from nova point of view there is nothing special about this setup it just works. the draw back to this is you cant change teh stroage avaiable to a vm without creating a new flaovr. exposing the nvme deivce or subsection of them via cinder has the advantage of allowing you to use teh vloume api to tailor the amount of storage per vm rather then creating a bunch of different flavors but with the over head fo needing to connect to the storage over a network protocol. so there are trade off with both appoches. generally i recommend using local sotrage e.g. the vm root disk or ephemeral disk for fast scratchpad space to work on data bug persitie all relevent data permently via cinder volumes. that requires you to understand which block devices a local and which are remote but it give you the best of both worlds.

...

On Wed, Aug 5, 2020 at 7:48 AM Sean Mooney <smooney@redhat.com> wrote:

...
On Wed, 2020-08-05 at 12:40 +0100, Sean Mooney wrote:

...
On Wed, 2020-08-05 at 12:19 +0100, Lee Yarwood wrote:

...
On 05-08-20 05:03:29, Eric K. Miller wrote:

...
In case this is the answer, I found that in nova.conf, under the [libvirt] stanza, images_type can be set to "lvm". This looks like

it

...
...
...
may do the trick - using the compute node's LVM to provision and

mount a

...
...
...
logical volume, for either persistent or ephemeral storage defined in the flavor.

Can anyone validate that this is the right approach according to our needs?

I'm not sure if it is given your initial requirements.

Do you need full host block devices to be provided to the instance?

The LVM imagebackend will just provision LVs on top of the provided VG so there's no direct mapping to a full host block device with this approach.

That said there's no real alternative available at the moment.

well one alternitive to nova providing local lvm storage is to use the cinder lvm driver but install it on all compute nodes then use the cidner InstanceLocalityFilter to ensure the volume is alocated

form the host

...
the vm is on.

https://docs.openstack.org/cinder/latest/configuration/block-storage/schedul...

...
on drawback to this is that if the if the vm is moved i think you would

need to also migrate the cinder volume

...
seperatly afterwards.

by the way if you were to take this approch i think there is an nvmeof driver so you can use nvme over rdma instead of iscsi.

...
...
...
Also, I have read about the LVM device filters - which is important

to

...
...
...
avoid the host's LVM from seeing the guest's volumes, in case anyone else finds this message.

Yeah that's a common pitfall when using LVM based ephemeral disks that contain additional LVM PVs/VGs/LVs etc. You need to ensure that the

host

...
...
is configured to not scan these LVs in order for their PVs/VGs/LVs etc to remain hidden from the host:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/htm...

...

...
...
...

Donny Davis

6:22 a.m.

On Wed, Aug 5, 2020 at 9:01 AM Sean Mooney <smooney@redhat.com> wrote:

...

...
I use local nvme to drive the CI workload for the openstack community for the last year or so. It seems to work pretty well. I just created a filesystem (xfs) and mounted it to /var/lib/nova/instances I moved glance to using my swift backend and it really made the download of the images much faster.

It depends on if the workload is going to handle HA or you are expecting to migrate machines. If the workload is ephemeral or HA can be handled in

...
app I think local storage is still a very viable option.

Simpler is better IMO yes that works well with the default flat/qcow file format i assume there was a reason this was not the starting point.

On Wed, 2020-08-05 at 08:36 -0400, Donny Davis wrote: the the nova lvm backend i think does not supprot thin provisioning so fi you did the same thing creating the volume group on the nvme deivce you would technically get better write performance after the vm is booted but the vm spwan is slower since we cant take advantage of thin providioning and each root disk need to be copided form the cahced image.

so just monting the nova data directory on an nvme driver or a raid of nvme drives works well and is simple to do.

i take a slightly more complex approach from my home cluster wehre i put the nova data directory on a bcache block device which puts an nvme pci ssd as a cache infront of my raid 10 fo HDDs to acclerate it. from nova point of view there is nothing special about this setup it just works.

the draw back to this is you cant change teh stroage avaiable to a vm without creating a new flaovr. exposing the nvme deivce or subsection of them via cinder has the advantage of allowing you to use teh vloume api to tailor the amount of storage per vm rather then creating a bunch of different flavors but with the over head fo needing to connect to the storage over a network protocol.

so there are trade off with both appoches. generally i recommend using local sotrage e.g. the vm root disk or ephemeral disk for fast scratchpad space to work on data bug persitie all relevent data permently via cinder volumes. that requires you to understand which block devices a local and which are remote but it give you the best of both worlds.

...
On Wed, Aug 5, 2020 at 7:48 AM Sean Mooney <smooney@redhat.com> wrote:

...
On Wed, 2020-08-05 at 12:40 +0100, Sean Mooney wrote:

...
On Wed, 2020-08-05 at 12:19 +0100, Lee Yarwood wrote:

...
On 05-08-20 05:03:29, Eric K. Miller wrote:

...
In case this is the answer, I found that in nova.conf, under the [libvirt] stanza, images_type can be set to "lvm". This looks

...
...
it

...
...
...
may do the trick - using the compute node's LVM to provision and

mount a

...
...
...
logical volume, for either persistent or ephemeral storage

defined in

...
...
...
...
the flavor.

Can anyone validate that this is the right approach according to our needs?

I'm not sure if it is given your initial requirements.

Do you need full host block devices to be provided to the instance?

The LVM imagebackend will just provision LVs on top of the

...
...
...
...
so there's no direct mapping to a full host block device with this approach.

That said there's no real alternative available at the moment.

well one alternitive to nova providing local lvm storage is to use the cinder lvm driver but install it on all compute nodes then use the cidner InstanceLocalityFilter to ensure the volume is alocated

form the host

...
the vm is on.

https://docs.openstack.org/cinder/latest/configuration/block-storage/schedul...

...
...
on drawback to this is that if the if the vm is moved i think you would

need to also migrate the cinder volume

...
seperatly afterwards.

by the way if you were to take this approch i think there is an nvmeof driver so you can use nvme over rdma instead of iscsi.

...
...
...
Also, I have read about the LVM device filters - which is

important

to

...
...
...
avoid the host's LVM from seeing the guest's volumes, in case anyone else finds this message.

Yeah that's a common pitfall when using LVM based ephemeral disks

like provided VG that

...
...
...
...
contain additional LVM PVs/VGs/LVs etc. You need to ensure that the

host

...
...
is configured to not scan these LVs in order for their PVs/VGs/LVs etc to remain hidden from the host:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/htm...

...
...
...
...

I have been through just about every possible nvme backend option for nova and the one that has turned up to be the most reliable and predictable has been simple defaults so far. Right now I am giving an nvme + nfs backend a spin. It doesn't perform badly, but it is not a local nvme. One of the things I have found with nvme is the mdadm raid driver is just not fast enough to keep up if you use anything other than raid0/1 (10) - I have a raid5 array I have got working pretty good - but its still limited. I don't have any vroc capable equipment, so maybe that will make a difference if implemented. I also have an all nvme ceph cluster I plan to test using cephfs (i know rbd is an option, but where is the fun in that). From my experience over the last two years in working with nvme only things, it seems that nothing comes close to matching the performance of what a couple local nvme drives in raid0 can do. NVME is so fast that the rest of my (old) equipment just can't keep up, it really does push things to the limits of what is possible. The all nvme ceph cluster does push my 40G network to its limits, but I had to create multiple OSD's per nvme to get there - for my gear (intel DC p3600's) I ended up at 3 OSD's per nvme. It seems to me to be limited by network performance. If you have any other questions I am happy to help where I can - I have been working with all nvme stuff for the last couple years and have gotten something into prod for about 1 year with it (maybe a little longer).

...

From what I can tell, getting max performance from nvme for an instance is a non-trivial task because it's just so much faster than the rest of the stack and careful considerations must be taken to get the most out of it.

I am curious to see where you take this Eric -- ~/DonnyD C: 805 814 6800 "No mission too difficult. No sacrifice too great. Duty First"

Eric K. Miller

9:39 p.m.

From: Donny Davis [mailto:donny@fortnebula.com] Sent: Wednesday, August 05, 2020 8:23 AM

...

If you have any other questions I am happy to help where I can - I have been working with all nvme stuff for the last couple years and have gotten something into prod for about 1 year with it (maybe a little longer). From what I can tell, getting max performance from nvme for an instance is a non-trivial task because it's just so much faster than the rest of the stack and careful considerations must be taken to get the most out of it. I am curious to see where you take this Eric

Thanks for the response! We also use Ceph with NVMe SSDs, with many NVMe namespaces with one OSD per namespace, to fully utilize the SSDs. You are right - they are so fast that they are literally faster than any application can use. They are great for multi-tenant environments, though, where it's usually better to have more hardware than people can utilize. My first test is to try using the Libvirt "images_type=lvm" method to see how well it works. I will report back... Eric

Eric K. Miller

9:53 p.m.

...

From: Sean Mooney [mailto:smooney@redhat.com] Sent: Wednesday, August 05, 2020 8:01 AM yes that works well with the default flat/qcow file format i assume there was a reason this was not the starting point. the nova lvm backend i think does not supprot thin provisioning so fi you did the same thing creating the volume group on the nvme deivce you would technically get better write performance after the vm is booted but the vm spwan is slower since we cant take advantage of thin providioning and each root disk need to be copided form the cahced image.

I wasn't aware that the nova LVM backend ([libvirt]/images_type = lvm) didn't support thin provisioned LV's. However, I do see that the "sparse_logical_volumes" parameter indicates it has been deprecated: https://docs.openstack.org/nova/rocky/configuration/config.html#libvirt.spar... That would definitely be a downer.

...

so just monting the nova data directory on an nvme driver or a raid of nvme drives works well and is simple to do.

Maybe we should consider doing this instead. I'll test with the Nova LVM backend first.

...

so there are trade off with both appoches. generally i recommend using local sotrage e.g. the vm root disk or ephemeral disk for fast scratchpad space to work on data bug persitie all relevent data permently via cinder volumes. that requires you to understand which block devices a local and which are remote but it give you the best of both worlds.

Our use case simply requires high-speed non-redundant storage for self-replicating applications like Couchbase, Cassandra, MongoDB, etc. or very inexpensive VMs that are backed-up often and can withstand the downtime when restoring from backup. That will be one more requirement (or rather a very nice to have), is to be able to create images (backups) of the local storage onto object storage, so hopefully "openstack server backup create" works like it does with rbd-backed Nova-managed persistent storage. I will let you know what I find out! Thanks everyone! Eric

Sean Mooney

6 Aug 6 Aug

3:15 a.m.

...

...
From: Sean Mooney [mailto:smooney@redhat.com] Sent: Wednesday, August 05, 2020 8:01 AM yes that works well with the default flat/qcow file format i assume there was a reason this was not the starting point. the nova lvm backend i think does not supprot thin provisioning so fi you did the same thing creating the volume group on the nvme deivce you would technically get better write performance after the vm is booted but the vm spwan is slower since we cant take advantage of thin providioning and each root disk need to be copided form the cahced image.

I wasn't aware that the nova LVM backend ([libvirt]/images_type = lvm) didn't support thin provisioned LV's. However, I do see that the "sparse_logical_volumes" parameter indicates it has been deprecated: https://docs.openstack.org/nova/rocky/configuration/config.html#libvirt.spar...

That would definitely be a downer.

...
so just monting the nova data directory on an nvme driver or a raid of nvme drives works well and is simple to do.

Maybe we should consider doing this instead. I'll test with the Nova LVM backend first.

...
so there are trade off with both appoches. generally i recommend using local sotrage e.g. the vm root disk or ephemeral disk for fast scratchpad space to work on data bug persitie all relevent data permently via cinder volumes. that requires you to understand which block devices a local and which are remote but it give you the best of both worlds.

Our use case simply requires high-speed non-redundant storage for self-replicating applications like Couchbase, Cassandra, MongoDB, etc. or very inexpensive VMs that are backed-up often and can withstand the downtime when restoring from backup.

That will be one more requirement (or rather a very nice to have), is to be able to create images (backups) of the local storage onto object storage, so hopefully "openstack server backup create" works like it does with rbd-backed Nova-managed persistent storage. it wil snapshot the root disk if you use addtional ephmeeral disks i do not think they are included but if you create the vms wit a singel root disk that is big enaough for your needs and use swift as your glance backend

On Wed, 2020-08-05 at 23:53 -0500, Eric K. Miller wrote: then yes. it will store the backups in object storage and rotate up to N backups per instance.

...

I will let you know what I find out!

Thanks everyone!

Eric

Eric K. Miller

3:26 a.m.

...

it wil snapshot the root disk if you use addtional ephmeeral disks i do not think they are included but if you create the vms wit a singel root disk that is big enaough for your needs and use swift as your glance backend then yes. it will store the backups in object storage and rotate up to N backups per instance.

Thanks Sean! I tested a VM with a single root disk (no ephemeral disks) and it worked as expected (how you described).

Eric K. Miller

5 Aug 5 Aug

9:30 p.m.

...

That said there's no real alternative available at the moment. well one alternitive to nova providing local lvm storage is to use the cinder lvm driver but install it on all compute nodes then use the cidner InstanceLocalityFilter to ensure the volume is alocated form the host the vm is on. https://docs.openstack.org/cinder/latest/configuration/block- storage/scheduler-filters.html#instancelocalityfilter on drawback to this is that if the if the vm is moved i think you would need to also migrate the cinder volume seperatly afterwards.

I wasn't aware of the InstanceLocalityFilter, so thank you for mentioning it! Eric

Eric K. Miller

9:28 p.m.

...

Do you need full host block devices to be provided to the instance?

No - a thin-provisioned LV in LVM would be best.

...

The LVM imagebackend will just provision LVs on top of the provided VG so there's no direct mapping to a full host block device with this approach.

That's perfect!

...

Yeah that's a common pitfall when using LVM based ephemeral disks that contain additional LVM PVs/VGs/LVs etc. You need to ensure that the host is configured to not scan these LVs in order for their PVs/VGs/LVs etc to remain hidden from the host:

Thanks for the link! I will let everyone know how testing goes. Eric

Eric K. Miller

6 Aug 6 Aug

12:57 a.m.

...

No - a thin-provisioned LV in LVM would be best.

From testing, it looks like thick-provisioned is the only choice at this stage. That's fine.

...

I will let everyone know how testing goes.

So far, everything is working perfectly with Nova using LVM. It was a quick configuration and it did exactly what I expected, which is always nice. :) As far as performance goes, it is decent, but not stellar. Of course, I'm comparing crazy fast native NVMe storage in RAID 0 across 4 x Micron 9300 SSDs (using md as the underlying physical volume in LVM) to virtualized storage. Some numbers from fio, just to get an idea for how good/bad the IOPS will be: Configuration: 32 core EPYC 7502P with 512GiB of RAM - CentOS 7 latest updates - Kolla Ansible (Stein) deployment 32 vCPU VM with 64GiB of RAM 32 x 10GiB test files (I'm using file tests, not raw device tests, so not optimal, but easiest when the VM root disk is the test disk) iodepth=10 numofjobs=32 time=30 (seconds) The VM was deployed using a qcow2 image, then deployed as a raw image, to see the difference in performance. There was none, which makes sense, since I'm pretty sure the qcow2 image was decompressed and stored in the LVM logical volume - so both tests were measuring the same thing. Bare metal (random 4KiB reads): 8066MiB/sec 154.34 microsecond avg latency 2.065 million IOPS VM qcow2 (random 4KiB reads): 589MiB/sec 2122.10 microsecond avg latency 151k IOPS Bare metal (random 4KiB writes): 4940MiB/sec 252.44 microsecond avg latency 1.265 million IOPS VM qcow2 (random 4KiB writes): 589MiB/sec 2119.16 microsecond avg latency 151k IOPS Since the read and write VM results are nearly identical, my assumption is that the emulation layer is the bottleneck. CPUs in the VM were all at 55% utilization (all kernel usage). The qemu process on the bare metal machine indicated 1600% (or so) CPU utilization. Below are runs with sequential 1MiB block tests Bare metal (sequential 1MiB reads): 13.3GiB/sec 23446.43 microsecond avg latency 13.7k IOPS VM qcow2 (sequential 1MiB reads): 8378MiB/sec 38164.52 microsecond avg latency 8377 IOPS Bare metal (sequential 1MiB writes): 8098MiB/sec 39488.00 microsecond avg latency 8097 million IOPS VM qcow2 (sequential 1MiB writes): 8087MiB/sec 39534.96 microsecond avg latency 8087 IOPS Amazing that a VM can move 8GiB/sec to/from storage. :) However, IOPS limits are a bit disappointing when compared to bare metal (but this is relative since 151k IOPS is quite a bit!). Not sure if additional "iothreads" QEMU would help, but that is not set in the Libvirt XML file, and I don't see any way to use Nova to set it. The Libvirt XML for the disk appears as: <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native' discard='unmap'/> <source dev='/dev/nova_vg/4cc7dfa4-c57f-4e73-a6fa-0da283244a4b_disk'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> Any suggestions for improvement? I "think" that the "images_type = flat" option in nova.conf indicates that images are stored in the /var/lib/nova/instances/* directories? If so, that might be an option, but since we're using Kolla, that directory (or rather /var/lib/nova) is currently a docker volume. So, it might be necessary to mount the NVMe storage at its respective /var/lib/docker/volumes/nova_compute/_data/instances directory. Not sure if the "flat" option will be any faster, especially since Docker would be another layer to go through. Any opinions? Thanks! Eric

1896

Age (days ago)

1897

Last active (days ago)

List overview

Download

14 comments

4 participants

participants (4)

Donny Davis
Eric K. Miller
Lee Yarwood
Sean Mooney

[cinder][nova] Local storage in compute node

tags

participants (4)