Re: [dev][cinder] Bubbling up maintenance status on a volume from a third-party backend

9 Oct 2024

      Hello Rajat,

On Tue, Oct 8, 2024 at 9:50 PM Rajat Dhasmana <rdhasman@redhat.com> wrote:
...
Hi David,
On Tue, Oct 8, 2024 at 9:28 PM David Pineau <david.pineau@shadow.tech> wrote:
...
Hello Openstack and Cinder team,
I'm currently working on the storage management layer of our company's
Openstack platform. As we manage the storage ourselves, we may run
background operations, which are akin to a maintenance, on individual
volumes, or part of the infrastructure.
As a bit of context, this infrastructure will be both exposed to
external customers, as well as internal departments, and as such, we
are looking for a refined user experience where an error is not a
simple "an error occurred in service X", but something potentially
actionable by whoever encounters it.
At the moment in the Cinder documentation (and what I could find on
the discuss archive), there seems to be no way for a Cinder Driver and
its backing services to:
 - Tell Cinder that a specific Volume is undergoing a backend
maintenance and should be considered unavailable
 - raise a relevant error to Cinder about ongoing Operations affecting
a Volume within its backend, that cinder could properly react to
Thanks for mentioning the use cases however the problem statement
is still not very clear to me with the limited information.
1. Which backend storage are you using for Cinder and with which transport protocol?
For a bit more context, we have an existing infrastructure that we'll
switch over to openstack, as soon as we can. It is made up of a few
hundred storage servers over a few datacenters.

Given the limitations of cinder, which expects backends to be in a
configuration file, and to avoid reloading the configuration for each
addition/removal of a server, we chose to go towards the vendor-driver
design approach, writing our own custom volume driver (not contributed
upstream, as it's 100% custom). This will help us in the previous
endeavor.

As for the transport protocol, we currently use iSCSI, but are working
on supporting NVMEoF; and both will use the available components of
Cinder providing the relevant protocol's support.
...
2. Why would we want to set a particular volume in maintenance state and
what would be the maintenance being performed on the volume/LUN?
I'm asking because we have not encountered a case (at least to my knowledge) where
a particular volume goes under maintenance instead of the whole backend.
Our Hardware maintenance implies moving data out of the storage server
before we hand it over to our Datacenter operator, for historical and
practical matters (risk management, essentially). This means that for
every maintenance, we might want to "migrate" a volume within the same
cinder backend (but between actual hardware hosts), thus "behind the
scenes" from cinder's POV. As these operations may make the volume
unusable for a few moments, we wanted to check if there was a way to
properly bubble-up this information to cinder and hoped there was, as
many vendor drivers exist, that are probably able to do exactly so.
...
We provide APIs to disable/freeze[1] a backend and also enable/thaw a backend[2] to
avoid resources being created on it but not on a per volume basis.
I indeed saw this approach on various threads/issues, but then we'd
make the whole backend unusable, while only one or two HW servers
behind it are undergoing maintenance. This is not ideal in our book,
as we strive to provide the best user experience we can, with the
least interruptions possible.
...
It would also be good to bring this topic to the upcoming Epoxy virtual PTG where we
can properly discuss this idea. You can add your topic in the Cinder PTG etherpad[3].
We'd gladly bring our use-case and needs to the discussion if need be.
...
[1] https://docs.openstack.org/api-ref/block-storage/v3/#freeze-a-cinder-backend...
[2] https://docs.openstack.org/api-ref/block-storage/v3/#thaw-a-cinder-backend-h...
[3] https://etherpad.opendev.org/p/epoxy-ptg-cinder
Thanks
Rajat Dhasmana
...
As a first step, I wanted to check if you, the community, had ever
considered this issue (or at least we consider it to be one). We'd be
very happy if you had recipes or pieces of advice to share with us on
how to handle this.
If nothing is available or known, how could we help in bringing such
an improvement to Cinder ?
Kind regards,
--
David Pineau (joa)
Thanks