[manila] 2024.1 Caracal PTG summary
Hello. Here is a summary of all topics we have discussed during last week's PTG and the AIs that were a result of the discussion.
Thank you to all contributors that participated. If you missed one or more sessions, we got you covered. The Manila PTG, as well as the operator hour were recorded and the videos are available in the OpenStack Manila YouTube Channel [0].
Deferred deletion
-------------------------
We discussed a bug that exists and the need for a deferred deletion approach. We agreed that the deferred deletion would be ideal in case the driver needs to do multiple things in the deletion and we don't want the operators to wait.
#Agreed: The service should release user quotas as soon as the driver says it is processing the deletion; and handle errors internally
AI: kpdev and NetApp/SAP will look into the deferred deletion implementation and provide a spec
All things CephFS --------------------------
We discussed deprecating support for "standalone" NFS-Ganesha with the CephFS/NFS driver. We've supported using a clustered Ceph NFS service since the Zed release.
We've also added upgrade helpers to migrate from a standalone NFS-Ganesha service to the Ceph NFS service; this code was backported to stable/2023.1 (Antelope)
Testing and stabilization: We'll continue testing cephfs/nfs with standalone NFS-Ganesha until we drop support for ganesha's DBUS mechanism from the CephFS/NFS driver. IBM is currently testing the scale, performance and stability of clustered-nfs-ganesha and both, nfs-ganesha and cephadm are being updated accordingly Mulitnode devstack ceph in CI:
- devstack testing of cephfs/nfs has been stable, we're trying to run this job on stable/2023.1 - AI: ashrodri will enable ingress service with cephadm
Ceph NFS upgrades
The code patches for upgrading to a cephadm deployed NFS cluster were merged and backported to stable/2023.1 (Antelope).
When upgrading, access rules are reapplied and export locations will indicate the preferred export path for mounting shares.
Deprecating use of DBUS API with NFS-Ganesha
As the clustered nfs solution is hardened, and an upgrade mechanism is invented, we want to stop supporting the use of "standalone-nfs/ganesha" server with CephFS.
AI: add a deprecation warning in the Caracal cycle and communicate it in the mailing list
Multiple CephFS filesystems
We discussed pros/cons of using a single driver representing multiple file systems as Manila storage pools - there are not a lot of benefits to this, especially considering the ease in deployment/configuration of the manila share services currently
For multi-fs environments, the user experience to mount shares with Native CephFS is poor since the export paths no longer expose the filesystem name
Operators suggested that we encode the filesystem name in export location metadata so automation tools (e.g.: manila CSI plugin, nova/virtiofs) can consume it programmatically
AI: Elias Wimmer will report a bug against the CephFS driver
AI: Ceph driver documentation will be updated regarding multi-fs and the behavior of the driver
Ceph Mon address is never re-validated [3]
Today, the only way to re-export a share is by restarting the manila-share manager service. We discussed an API based trigger to the ensure shares logic that will recalculate the export locations of the shares.
We agreed that this implementation should add a micro-state to the share and that the API should deny other management operations on the shares when "ensuring".
AI: carloss will propose a specification to implement this change
Human readable export locations
-----------------------------------------------
Export locations are constructed for machines, and they do not use human friendly names. NetApp proposed user customizable export locations a few releases ago [4]
We discussed the proposal, and find it is still relevant - we brainstormed using a share type extra specification for customizing a prefix to the export location names
We agreed that updating the export location of a share is bound to be disruptive and will not be part of the implementation. We may consider this enhancement if there is a very strong use case
AI: NetApp folks will propose updates to the lite-spec and work on the implementation
Share backups
---------------------
A generic NFS implementation of share backups was introduced in the previous (Bobcat) cycle
We discussed a share-manager driven advantaged share backup driver implementations, as well as the data service backup driver interface
Cern found that the backup driver interface called back into the data service, and the driver interface needs changes
AI: Luis from Cern will update the Backup driver to decouple the data manager interface on-demand in preparation for their implementation of the restic backup driver
Tech debt
--------------
During the tech debt session, gouthamr walked us through some multi-release tech debt items, such as the progress on scenario tests and the implementation of mechanisms to prevent race conditions.
It is a good opportunity for someone in the community to step up and do some of the work, which can be a good technical challenge.
AI: Discuss the specification in weekly IRC meetings and find new owner/s
Barbican integration
-----------------------------
Backend data encryption has been supported for several releases in Manila, but users don't have the ability to configure their encryption keys
SAP is pursuing an integration with Barbican for Manila and has worked on a specification.
#Agreed: Manila should have a Barbican/Castellan API layer to interact with the keystore, and share backends shouldn't have code to interact directly with Barbican.
AI: kpdev will update the proposed spec [5] also documenting the interactions with Castellan
Manila/Cinder cross-project session
---------------------------------------------------
We organized a cross-project session with the Cinder team to discuss a couple of things:
- An enhancement to the manila volume extend API calls in the generic driver: - as the Cinder APIs evolved and Manila would benefit from it. Rajat can work on the enhancement in case the Manila contributors don't have the bandwidth. - The way DHSS=True works in Manila - we brainstormed hypothetical support in Cinder for service instances/storage virtual machines and tenant driven volume networking - We also talked about the progress both the projects achieved with respect to Secure RBAC, OpenStackClient and OpenStackSDK.
Expectations while submitting features
--------------------------------------------------------
We are working on documenting the process one should go through while submitting features [6]. This also covers what are the deadlines and what reviewers expect in terms of feature submission.
AI: carloss will update the new documentation with a checklist that will illustrate the definition of done for a feature
Howcasts
--------------
This is something we have been ideating on for a while. The idea is to produce short videos that will help newcomers to the project. We have an idea of what to work on [7] and help from the community would be appreciated.
AI: we'll start the work on the howcasts and try to split the load within the community members
SQLAlchemy 2.0
-----------------------
stephenfin has been working on the SQLAlchemy 2.0 changes for Manila and we have made good progress so far. One of the problems to test the changes is that we need to wait a whole entire CI run that can take up to 3 hours
Agreed:
- We'll work with stephenfin to reduce the amount of wait time to get CI results by reducing the tests using the dummy drivers
-
We'll review the changes and help figure out the issues with migration of share replicas and share instances queries
Tech Debt Roundup - 2; multi-release efforts that have attention
--------------------------------------------------------------------------- ----------------
Metadata spec We currently have metadata implemented for three resources (shares, share snapshots and share network subnets). We have open patches for access rules and export locations. AI: Update the export locations and access rules metadata changes and work on them during the Caracal cycle.
OSC Support:
We have parity between OSC and the manila shell client and we added a deprecation warning in the previous cycle and the removal will happen prior to the next SLURP release (E). This means that Manila's shell client will not be shipped as part of the D release.
- More negative functional tests would be welcome, as we have implemented only the happy-paths during a hackathon. - AI: consider a OSC test hackathon during the Caracal release hackathon - AI: all documentation will be updated to utilize OSC in examples
SDK
We have completed a lot of this with the help of university students/outreachy interns.
Most of the work that was pressing and features that were needed to integrate with other services were covered.
AI: Continue working towards full parity between openstacksdk and manilaclient SDK
Agreed: There are no plans to deprecate manilaclient SDK
Share Transfers:
We have share transfers between projects for DHSS=False, but we are still lacking support for DHSS=True. This is a cool technical challenge to work in Manila, and having someone willing to continue this work would be appreciated.
Follow up on Resource locks
------------------------------------------
The two featureful changes for resource locks were merged in the Bobcat cycle (share resource locks and locks for access rules deletion and visibility restrictions)
There are two changes yet to be merged, being the OpenStack SDK patch and the manila-tempest-plugin tests for the access rules locks.
We brainstormed implementations of other resource locks, such as a visibility lock on sensitive fields of a security service, or deletion locks for snapshots, backups and replicas
AI: We'll implement new resource locks opportunistically during the Caracal release
Operator Hour
---------------------
We have reflected over the results of the latest OpenStack user survey [9] and talked about the progress of the most requested features and also requested feedback from operators.
We discussed the use cases for the disabling manila services feature, and chatted about one possible enhancement to this feature which would be to add backends as disabled and enable them later.
AI: we should enhance our documentation to talk about start and stopping services, as well as the differences between stopping and disabling them.
[0] https://www.youtube.com/@openstackmanila [1] https://github.com/openstack-k8s-operators/manila-operator/ - a replacement for TripleO [2] https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/director... (Native CephFS) [3] https://bugs.launchpad.net/manila/+bug/1996793 [4] https://opendev.org/openstack/manila-specs/src/commit/baf6209da8e5b472280241... [5] https://review.opendev.org/c/openstack/manila-specs/+/898999 [6] https://review.opendev.org/c/openstack/manila/+/898855 [7] https://etherpad.opendev.org/p/manila-howcasts [8] https://etherpad.opendev.org/p/caracal-ptg-manila-operator-hour [9] https://drive.google.com/file/d/1ZM722TOX3ouP_7Cet2JdURFttTuI5ZLe/view?usp=d... [10] https://etherpad.opendev.org/p/caracal-ptg-manila
If you have questions or would like to follow-up on any of the topics, don't hesitate to reach out.
Thank you! carloss
participants (1)
-
Carlos Silva