[nova][cinder] Providing ephemeral storage to instances - Cinder or Nova

Rajat Dhasmana rdhasman at redhat.com
Tue Apr 4 08:39:01 UTC 2023


On Sat, Mar 25, 2023 at 12:27 AM Sean Mooney <smooney at redhat.com> wrote:

> i responed in line but just a waring this is a usecase we ahve heard
> before.
> there is no simple option im afraid and there are many many sharp edges
> and severl littel know features/limitatiosn that your question puts you
> right in the
> middel of.
>
> On Fri, 2023-03-24 at 16:28 +0100, Christian Rohmann wrote:
> > Hello OpenStack-discuss,
> >
> > I am currently looking into how one can provide fast ephemeral storage
> > (backed by local NVME drives) to instances.
> >
> >
> > There seem to be two approaches and I would love to double-check my
> > thoughts and assumptions.
> >
> > 1) *Via Nova* instance storage and the configurable "ephemeral" volume
> > for a flavor
> >
> > a) We currently use Ceph RBD als image_type
> > (
> https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_type),
>
> > so instance images are stored in Ceph, not locally on disk. I believe
> > this setting will also cause ephemeral volumes (destination_local) to be
> > placed on a RBD and not /var/lib/nova/instances?
> it should be in ceph yes we do not support havign the root/swap/ephemral
> disk use diffent storage locatiosn
> > Or is there a setting to set a different backend for local block devices
> > providing "ephemeral" storage? So RBD for the root disk and a local LVM
> > VG for ephemeral?
> no that would be a new feature and not a trivial one as yo uwould have to
> make
> sure it works for live migration and cold migration.
>
> >
> > b) Will an ephemeral volume also be migrated when the instance is
> > shutoff as with live-migration?
> its hsoudl be. its not included in snapshots so its not presergved
> when shelving. that means corss cell cold migration will not preserve the
> disk.
>
> but for a normal cold migration it shoudl be scp'd or rsynced with the
> root disk
> if you are using the raw/qcow/flat images type if i remember correctly.
> with RBD or other shared storage like nfs it really sould be preserved.
>
> one other thing to note is ironic and only ironic support the
> preserve_ephemeral option in the rebuild api.
>
> libvirt will wipte the ephmeral disk if you rebuild or evacuate.
> > Or will there be an new volume created on the target host? I am asking
> > because I want to avoid syncing 500G or 1T when it's only "ephemeral"
> > and the instance will not expect any data on it on the next boot.
> i would perssonally consider it a bug if it was not transfered.
> that does not mean that could not change in the future.
> this is a very virt driver specific behaivor by the way and nto one that
> is partically well docuemnted.
> the ephemeral shoudl mostly exist for the lifetime of an instance. not the
> lifetime of a vm
>
> for exmple it should nto get recreate vai a simple reboot or live migration
> it should not get created for cold migration or rezise.
> but it will get wipted for shelve_offload, cross cell resize and evacuate.
> >
> > c) Is the size of the ephemeral storage for flavors a fixed size or just
> > the upper bound for users? So if I limit this to 1T, will such a flavor
> > always provision a block device with his size?
> flavor.ephemeral_gb is an upper bound and end users can devide that
> between multipel ephermal disks
> on the same instance.  so if its 100G you can ask for 2 50G epmeeral disks
>
> you specify the toplogy of the epmermeral disk using the
> block_device_mapping_v2 parmater on the server
> create.
> this has been automated in recent version of the openstack client
>
> so you can do
>
> openstack server creeate  --ephemeral size=50,format=ext4 --ephemeral
> size=50,format=vfat ...
>
>
> https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#cmdoption-openstack-server-create-ephemeral
> this is limted by
>
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.max_local_block_devices
>
> >
> > I suppose using LVM this will be thin provisioned anyways?
> to use the lvm backend with libvirt you set
>
> https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_volume_group
> to identify which lvm VG to use.
>
>
> https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.sparse_logical_volumes
> might enable thin provsion or it might
> work without it but see the note
>
> """
> Warning
>
> This option is deprecated for removal since 18.0.0. Its value may be
> silently ignored in the future.
>
> Reason
>
>     Sparse logical volumes is a feature that is not tested hence not
> supported. LVM logical volumes are preallocated by default. If you want thin
> provisioning, use Cinder thin-provisioned volumes.
> """
>
> the nova lvm supprot has been in maintance mode for many years.
>
> im not opposed to improving it just calling out that it has bugs and noone
> has really
> worked on adressing them in 4 or 5 years which is sad becasue it out
> performnce raw for local
> storage perfroamce and if thin provisioning still work it shoudl
> outperform qcow too for a simialr usecase.
>
> you are well into undefined behavior land however at this point
>
> we do not test it so we assume untile told otherwise that its broken.
>
>
> >
> >
> > 2) *Via Cinder*, running cinder-volume on each compute node to provide a
> > volume type "ephemeral", using e.g. the LVM driver
> >
> > a) While not really "ephemeral" and bound to the instance lifecycle,
> > this would allow users to provision ephemeral volume just as they need
> them.
> > I suppose I could use backend specific quotas
> > (
> https://docs.openstack.org/cinder/latest/cli/cli-cinder-quotas.html#view-block-storage-quotas)
>
> > to
> > limit the number of size of such volumes?
> >
> > b) Do I need to use the instance locality filter
> > (
> https://docs.openstack.org/cinder/latest/contributor/api/cinder.scheduler.filters.instance_locality_filter.html)
>
> > then?
>
> That is an option but not ideally since it stilll means conencting to the
> volume via iscsi or nvmeof even if its effectlvy via localhost
> so you still have the the network layer overhead.
>
>
I haven't tried it so I'm not 100% sure if it works but we do support local
attach with the RBD connector.
While creating the connector object, we can pass "do_local_attach"=True[1]
and that should do local attach when we call
connect volume for the RBD connector.
>From a quick search, I can see all the consumers of this code are:
1) cinderlib[3]
2) nova hyperv driver[4]
3) python-brick-cinderclient-ext[5]
4) freezer[6]
5) zun[7]

It's interesting to see a project called compute-hyperv[8] (similar to
nova's hyperv driver) using it as well. Not sure why it's created
separately though.

[1]
https://opendev.org/openstack/os-brick/src/commit/28ffcdbfa138859859beca2f80627c076269be56/os_brick/initiator/connectors/rbd.py
[2]
https://opendev.org/openstack/os-brick/src/commit/28ffcdbfa138859859beca2f80627c076269be56/os_brick/initiator/connectors/rbd.py#L263-L267
[3]
https://opendev.org/openstack/cinderlib/src/commit/9c37686f358e9228446cd85e19db26a56b2f9cbe/cinderlib/objects.py#L779
[4]
https://opendev.org/openstack/nova/src/commit/29de62bf3b3bf5eda8986bc94babf1c94d67bd4e/nova/virt/hyperv/volumeops.py#L378
[5]
https://opendev.org/openstack/python-brick-cinderclient-ext/src/branch/master/brick_cinderclient_ext/client.py
[6]
https://opendev.org/openstack/freezer/src/commit/5effc1382833ad111249bcd279b11fbe95e10a6b/freezer/engine/osbrick/client.py#L78
[7]
https://opendev.org/openstack/zun/src/commit/0288a4517846d07ee5724f86ebe34e364dc2bbe9/zun/volume/cinder_workflow.py#L60-L61
[8]
https://opendev.org/openstack/compute-hyperv/src/commit/4393891fa8356aa31b13bd57cf96cb5109acc7c3/compute_hyperv/nova/volumeops.py#L780

when i alas brought up this topic in a diffent context the alternitive to
> cinder and nova was to add a lvm cyborg driver
> so that it could parttion local nvme devices and expose that to a guest.
> but i never wrote that and i dotn think anyone else has.
> if you had a slightly diffent usecase such as providing an entire nvme or
> sata device to a guest the cyborge would be how you would do
> that. nova pci passhtough is not an option as it is not multi tenant safe.
> its expclsively for stateless device not disk so we do not
> have a way to rease the data when done. cyborg with htere driver modle can
> fullfile the multi tenancy requirement.
> we have previously rejected adding this capabliyt into nova so i dont
> expect us to add it any tiem in teh near to medium term.
>
> we are trying to keep nova device manamgnet to stateless only.
> That said we added intel PMEM/NVDIM supprot to nova and did handle both
> optionl data transfer and multi tancny but that was a non trivial amount of
> work
>
>
> >
> > c)  Since a volume will always be bound to a certain host, I suppose
> > this will cause side-effects to instance scheduling?
> > With the volume remaining after an instance has been destroyed (beating
> > the purpose of it being "ephemeral") I suppose any other instance
> > attaching this volume will
> > be scheduling on this very machine?
> >
> no nova would have no knowage about the volume locality out of the box
> >  Is there any way around this? Maybe
> > a driver setting to have such volumes "self-destroy" if they are not
> > attached anymore?
> we hate those kind of config options nova would not know that its bound to
> the host at the schduler level and
> we would nto really want to add orcstration logic like that for "something
> its oke to delete our tenatns data"
> by default today if you cold/live migrated the vm would move but the
> voluem vould not and you would end up accessing it remotely.
>
> you woudl have to then do a volume migration sepreately in cinder i think.
> >
> > d) Same question as with Nova: What happens when an instance is
> > live-migrated?
> >
> i think i anser this above?
> >
> >
> > Maybe others also have this use case and you can share your solution(s)?
> adding a cyborg driver for lvm storage and integrateing that with nova
> would like be the simpelt option
>
> you coudl extend nova but as i said we have rejected that in the past.
> that said the generic resouce table we added for pemem was made generic so
> that future resocues like local block
> device could be tracked there without db changes.
>
> supproting differnt image_type backend for root,swap and ephmeral would be
> possibel.
> its an invasive change but might be more natural then teh resouce tabel
> approch.
> you coudl reuse more fo the code and inherit much fo the exiting
> fucntionality btu makeing sure you dont break
> anything in the process woudl take a lot of testing.
>
> > Thanks and with regards
> >
> >
> > Christian
> >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230404/5ae6e24e/attachment-0001.htm>


More information about the openstack-discuss mailing list