Hello Rajat, On Tue, Oct 8, 2024 at 9:50 PM Rajat Dhasmana <rdhasman@redhat.com> wrote:
Hi David,
On Tue, Oct 8, 2024 at 9:28 PM David Pineau <david.pineau@shadow.tech> wrote:
Hello Openstack and Cinder team,
I'm currently working on the storage management layer of our company's Openstack platform. As we manage the storage ourselves, we may run background operations, which are akin to a maintenance, on individual volumes, or part of the infrastructure.
As a bit of context, this infrastructure will be both exposed to external customers, as well as internal departments, and as such, we are looking for a refined user experience where an error is not a simple "an error occurred in service X", but something potentially actionable by whoever encounters it.
At the moment in the Cinder documentation (and what I could find on the discuss archive), there seems to be no way for a Cinder Driver and its backing services to: - Tell Cinder that a specific Volume is undergoing a backend maintenance and should be considered unavailable - raise a relevant error to Cinder about ongoing Operations affecting a Volume within its backend, that cinder could properly react to
Thanks for mentioning the use cases however the problem statement is still not very clear to me with the limited information. 1. Which backend storage are you using for Cinder and with which transport protocol?
For a bit more context, we have an existing infrastructure that we'll switch over to openstack, as soon as we can. It is made up of a few hundred storage servers over a few datacenters. Given the limitations of cinder, which expects backends to be in a configuration file, and to avoid reloading the configuration for each addition/removal of a server, we chose to go towards the vendor-driver design approach, writing our own custom volume driver (not contributed upstream, as it's 100% custom). This will help us in the previous endeavor. As for the transport protocol, we currently use iSCSI, but are working on supporting NVMEoF; and both will use the available components of Cinder providing the relevant protocol's support.
2. Why would we want to set a particular volume in maintenance state and what would be the maintenance being performed on the volume/LUN? I'm asking because we have not encountered a case (at least to my knowledge) where a particular volume goes under maintenance instead of the whole backend.
Our Hardware maintenance implies moving data out of the storage server before we hand it over to our Datacenter operator, for historical and practical matters (risk management, essentially). This means that for every maintenance, we might want to "migrate" a volume within the same cinder backend (but between actual hardware hosts), thus "behind the scenes" from cinder's POV. As these operations may make the volume unusable for a few moments, we wanted to check if there was a way to properly bubble-up this information to cinder and hoped there was, as many vendor drivers exist, that are probably able to do exactly so.
We provide APIs to disable/freeze[1] a backend and also enable/thaw a backend[2] to avoid resources being created on it but not on a per volume basis.
I indeed saw this approach on various threads/issues, but then we'd make the whole backend unusable, while only one or two HW servers behind it are undergoing maintenance. This is not ideal in our book, as we strive to provide the best user experience we can, with the least interruptions possible.
It would also be good to bring this topic to the upcoming Epoxy virtual PTG where we can properly discuss this idea. You can add your topic in the Cinder PTG etherpad[3].
We'd gladly bring our use-case and needs to the discussion if need be.
[1] https://docs.openstack.org/api-ref/block-storage/v3/#freeze-a-cinder-backend... [2] https://docs.openstack.org/api-ref/block-storage/v3/#thaw-a-cinder-backend-h... [3] https://etherpad.opendev.org/p/epoxy-ptg-cinder
Thanks Rajat Dhasmana
As a first step, I wanted to check if you, the community, had ever considered this issue (or at least we consider it to be one). We'd be very happy if you had recipes or pieces of advice to share with us on how to handle this.
If nothing is available or known, how could we help in bringing such an improvement to Cinder ?
Kind regards,
-- David Pineau (joa)
Thanks