Open Stack

Wed Aug 5 13:22:58 UTC 2020

On Wed, Aug 5, 2020 at 9:01 AM Sean Mooney <smooney at redhat.com> wrote:

> On Wed, 2020-08-05 at 08:36 -0400, Donny Davis wrote:
> > I use local nvme to drive the CI workload for the openstack community for
> > the last year or so. It seems to work pretty well. I just created a
> > filesystem (xfs) and mounted it to /var/lib/nova/instances
> > I moved glance to using my swift backend and it really made the download
> of
> > the images much faster.
> >
> > It depends on if the workload is going to handle HA or you are expecting
> to
> > migrate machines. If the workload is ephemeral or HA can be handled in
> the
> > app I think local storage is still a very viable option.
> >
> > Simpler is better IMO
> yes that works well with the default flat/qcow file format
> i assume there was a reason this was not the starting point.
> the nova lvm backend i think does not supprot thin provisioning
> so fi you did the same thing creating the volume group on the nvme deivce
> you would technically get better write performance after the vm is booted
> but
> the vm spwan is slower since we cant take advantage of thin providioning
> and
> each root disk need to be copided  form the cahced image.
>
> so just monting the nova data directory on an nvme driver or a raid of
> nvme drives
> works well and is simple to do.
>
> i take a slightly more complex approach from my home cluster wehre i put
> the
> nova data directory on a bcache block device which puts an nvme pci ssd as
> a cache
> infront of my raid 10 fo HDDs to acclerate it. from nova point of view
> there is nothing special
> about this setup it just works.
>
> the draw back to this is you cant change teh stroage avaiable to a vm
> without creating a new flaovr.
> exposing the nvme deivce or subsection of them via cinder has the
> advantage of allowing you to use
> teh vloume api to tailor the amount of storage per vm rather then creating
> a bunch of different flavors
> but with the over head fo needing to connect to the storage over a network
> protocol.
>
> so there are trade off with both appoches.
> generally i recommend using local sotrage e.g. the vm root disk or
> ephemeral disk for fast scratchpad space
> to work on data bug persitie all relevent data permently via cinder
> volumes. that requires you to understand which block
> devices a local and which are remote but it give you the best of both
> worlds.
>
> >
> >
> >
> > On Wed, Aug 5, 2020 at 7:48 AM Sean Mooney <smooney at redhat.com> wrote:
> >
> > > On Wed, 2020-08-05 at 12:40 +0100, Sean Mooney wrote:
> > > > On Wed, 2020-08-05 at 12:19 +0100, Lee Yarwood wrote:
> > > > > On 05-08-20 05:03:29, Eric K. Miller wrote:
> > > > > > In case this is the answer, I found that in nova.conf, under the
> > > > > > [libvirt] stanza, images_type can be set to "lvm".  This looks
> like
> > >
> > > it
> > > > > > may do the trick - using the compute node's LVM to provision and
> > >
> > > mount a
> > > > > > logical volume, for either persistent or ephemeral storage
> defined in
> > > > > > the flavor.
> > > > > >
> > > > > > Can anyone validate that this is the right approach according to
> our
> > > > > > needs?
> > > > >
> > > > > I'm not sure if it is given your initial requirements.
> > > > >
> > > > > Do you need full host block devices to be provided to the instance?
> > > > >
> > > > > The LVM imagebackend will just provision LVs on top of the
> provided VG
> > > > > so there's no direct mapping to a full host block device with this
> > > > > approach.
> > > > >
> > > > > That said there's no real alternative available at the moment.
> > > >
> > > > well one alternitive to nova providing local lvm storage is to use
> > > > the cinder lvm driver but install it on all compute nodes then
> > > > use the cidner InstanceLocalityFilter to ensure the volume is
> alocated
> > >
> > > form the host
> > > > the vm is on.
> > > >
> > >
> > >
> https://docs.openstack.org/cinder/latest/configuration/block-storage/scheduler-filters.html#instancelocalityfilter
> > > > on drawback to this is that if the if the vm is moved i think you
> would
> > >
> > > need to also migrate the cinder volume
> > > > seperatly afterwards.
> > >
> > > by the way if you were to take this approch i think there is an nvmeof
> > > driver so you can use nvme over rdma
> > > instead of iscsi.
> > > >
> > > > >
> > > > > > Also, I have read about the LVM device filters - which is
> important
> > >
> > > to
> > > > > > avoid the host's LVM from seeing the guest's volumes, in case
> anyone
> > > > > > else finds this message.
> > > > >
> > > > >
> > > > > Yeah that's a common pitfall when using LVM based ephemeral disks
> that
> > > > > contain additional LVM PVs/VGs/LVs etc. You need to ensure that the
> > >
> > > host
> > > > > is configured to not scan these LVs in order for their PVs/VGs/LVs
> etc
> > > > > to remain hidden from the host:
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
>
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/logical_volume_manager_administration/lvm_filters
> > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> > >
> >
> >
>
>
I have been through just about every possible nvme backend option for nova
and the one that has turned up to be the most reliable and predictable has
been simple defaults so far.  Right now I am giving an nvme + nfs  backend
a spin. It doesn't perform badly, but it is not a local nvme. One of the
things I have found with nvme is the mdadm raid driver is just not fast
enough to keep up if you use anything other than raid0/1 (10) - I have a
raid5 array I have got working pretty good - but its still limited. I don't
have any vroc capable equipment, so maybe that will make a difference if
implemented.

I also have an all nvme ceph cluster I plan to test using cephfs (i know
rbd is an option, but where is the fun in that). From my experience over
the last two years in working with nvme only things, it seems that nothing
comes close to matching the performance of what a couple local nvme drives
in raid0 can do. NVME is so fast that the rest of my (old) equipment just
can't keep up, it really does push things to the limits of what is
possible.
The all nvme ceph cluster does push my 40G network to its limits, but I had
to create multiple OSD's per nvme to get there - for my gear (intel DC
p3600's) I ended up at 3 OSD's per nvme. It seems to me to be limited by
network performance.

If you have any other questions I am happy to help where I can - I have
been working with all nvme stuff for the last couple years and have gotten
something into prod for about 1 year with it (maybe a little longer).
>From what I can tell, getting max performance from nvme for an instance is
a non-trivial task because it's just so much faster than the rest of the
stack and careful considerations must be taken to get the most out of it.

I am curious to see where you take this Eric

-- 
~/DonnyD
C: 805 814 6800
"No mission too difficult. No sacrifice too great. Duty First"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200805/006deae6/attachment-0001.html>

Open Stack

[cinder][nova] Local storage in compute node

OpenStack

Community

Documentation

Branding & Legal