On Wed, 2020-08-05 at 08:36 -0400, Donny Davis wrote:
> I use local nvme to drive the CI workload for the openstack community for
> the last year or so. It seems to work pretty well. I just created a
> filesystem (xfs) and mounted it to /var/lib/nova/instances
> I moved glance to using my swift backend and it really made the download of
> the images much faster.
>
> It depends on if the workload is going to handle HA or you are expecting to
> migrate machines. If the workload is ephemeral or HA can be handled in the
> app I think local storage is still a very viable option.
>
> Simpler is better IMO
yes that works well with the default flat/qcow file format
i assume there was a reason this was not the starting point.
the nova lvm backend i think does not supprot thin provisioning
so fi you did the same thing creating the volume group on the nvme deivce
you would technically get better write performance after the vm is booted but
the vm spwan is slower since we cant take advantage of thin providioning and
each root disk need to be copided form the cahced image.
so just monting the nova data directory on an nvme driver or a raid of nvme drives
works well and is simple to do.
i take a slightly more complex approach from my home cluster wehre i put the
nova data directory on a bcache block device which puts an nvme pci ssd as a cache
infront of my raid 10 fo HDDs to acclerate it. from nova point of view there is nothing special
about this setup it just works.
the draw back to this is you cant change teh stroage avaiable to a vm without creating a new flaovr.
exposing the nvme deivce or subsection of them via cinder has the advantage of allowing you to use
teh vloume api to tailor the amount of storage per vm rather then creating a bunch of different flavors
but with the over head fo needing to connect to the storage over a network protocol.
so there are trade off with both appoches.
generally i recommend using local sotrage e.g. the vm root disk or ephemeral disk for fast scratchpad space
to work on data bug persitie all relevent data permently via cinder volumes. that requires you to understand which block
devices a local and which are remote but it give you the best of both worlds.
>
>
>
> On Wed, Aug 5, 2020 at 7:48 AM Sean Mooney <smooney@redhat.com> wrote:
>
> > On Wed, 2020-08-05 at 12:40 +0100, Sean Mooney wrote:
> > > On Wed, 2020-08-05 at 12:19 +0100, Lee Yarwood wrote:
> > > > On 05-08-20 05:03:29, Eric K. Miller wrote:
> > > > > In case this is the answer, I found that in nova.conf, under the
> > > > > [libvirt] stanza, images_type can be set to "lvm". This looks like
> >
> > it
> > > > > may do the trick - using the compute node's LVM to provision and
> >
> > mount a
> > > > > logical volume, for either persistent or ephemeral storage defined in
> > > > > the flavor.
> > > > >
> > > > > Can anyone validate that this is the right approach according to our
> > > > > needs?
> > > >
> > > > I'm not sure if it is given your initial requirements.
> > > >
> > > > Do you need full host block devices to be provided to the instance?
> > > >
> > > > The LVM imagebackend will just provision LVs on top of the provided VG
> > > > so there's no direct mapping to a full host block device with this
> > > > approach.
> > > >
> > > > That said there's no real alternative available at the moment.
> > >
> > > well one alternitive to nova providing local lvm storage is to use
> > > the cinder lvm driver but install it on all compute nodes then
> > > use the cidner InstanceLocalityFilter to ensure the volume is alocated
> >
> > form the host
> > > the vm is on.
> > >
> >
> > https://docs.openstack.org/cinder/latest/configuration/block-storage/scheduler-filters.html#instancelocalityfilter
> > > on drawback to this is that if the if the vm is moved i think you would
> >
> > need to also migrate the cinder volume
> > > seperatly afterwards.
> >
> > by the way if you were to take this approch i think there is an nvmeof
> > driver so you can use nvme over rdma
> > instead of iscsi.
> > >
> > > >
> > > > > Also, I have read about the LVM device filters - which is important
> >
> > to
> > > > > avoid the host's LVM from seeing the guest's volumes, in case anyone
> > > > > else finds this message.
> > > >
> > > >
> > > > Yeah that's a common pitfall when using LVM based ephemeral disks that
> > > > contain additional LVM PVs/VGs/LVs etc. You need to ensure that the
> >
> > host
> > > > is configured to not scan these LVs in order for their PVs/VGs/LVs etc
> > > > to remain hidden from the host:
> > > >
> > > >
> > >
> > >
> >
> >
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/logical_volume_manager_administration/lvm_filters
> > >
> > > >
> > > >
> > >
> > >
> >
> >
> >
>
>
I have been through just about every possible nvme backend option for nova and the one that has turned up to be the most reliable and predictable has been simple defaults so far. Right now I am giving an nvme + nfs backend a spin. It doesn't perform badly, but it is not a local nvme. One of the things I have found with nvme is the mdadm raid driver is just not fast enough to keep up if you use anything other than raid0/1 (10) - I have a raid5 array I have got working pretty good - but its still limited. I don't have any vroc capable equipment, so maybe that will make a difference if implemented.
I also have an all nvme ceph cluster I plan to test using cephfs (i know rbd is an option, but where is the fun in that). From my experience over the last two years in working with nvme only things, it seems that nothing comes close to matching the performance of what a couple local nvme drives in raid0 can do. NVME is so fast that the rest of my (old) equipment just can't keep up, it really does push things to the limits of what is possible.
The all nvme ceph cluster does push my 40G network to its limits, but I had to create multiple OSD's per nvme to get there - for my gear (intel DC p3600's) I ended up at 3 OSD's per nvme. It seems to me to be limited by network performance.
If you have any other questions I am happy to help where I can - I have been working with all nvme stuff for the last couple years and have gotten something into prod for about 1 year with it (maybe a little longer).
>From what I can tell, getting max performance from nvme for an instance is a non-trivial task because it's just so much faster than the rest of the stack and careful considerations must be taken to get the most out of it.
I am curious to see where you take this Eric
--
~/DonnyD
C: 805 814 6800
"No mission too difficult. No sacrifice too great. Duty First"