On Wed, Jan 22, 2025 at 7:54 AM James Page <james.page@canonical.com> wrote:
Hi Ashley
Firstly apologies for coming to this topic so late - I evidently missed the original deprecation at caracal and only tracked onto the removal of the protocol helper when I was debugging an issue in a Caracal deployment today.
On Sun, Dec 22, 2024 at 7:03 PM Ashley Rodriguez <ashrod98@gmail.com> wrote:
Hi zorillas!
As per subject line, we will be deprecating the standalone CephFS NFS protocol helper in this cycle. This is following our deprecation warning we put out in 2024.1 / Caracal [1] [2]. We've supported using a clustered Ceph NFS service since the Zed release. We've also added upgrade helpers to migrate from a standalone NFS-Ganesha service to the ceph-orch-deployed clustered NFS service; this code was backported to stable/2023.1 (Antelope). Although the upgrade path is disruptive, there is a UX improvement in Manila to ease this migration through surfacing “preferred” export paths.
Moving forward, deployments must have “cephfs_nfs_cluster_id” in your configuration, otherwise the driver will fail to start. Manila will not make any export changes to NFS-Ganesha, so upgrading manila without an NFS cluster will mean that only the control plane is not going to work; but, data plane will remain unaffected.
To assist migration, admins can also configure "cephfs_ganesha_export_ips" (or alternatively, "cephfs_ganesha_server_ip") alongside "cephfs_nfs_cluster_id". Setting these options will allow the CephFS driver to report additional export paths. These additional export paths will have the "preferred" metadata key set to False.
I appreciate that the cephadm/ceph orchestrator has become a popular way to deploy and manage Ceph.
As a core developer in the OpenStack Charms (which already uses this feature) and Sunbeam (which was planning to) projects I'm concerned that this tightly couples use of CephFS+Ganesha NFS and Manila with a specific choice of deployment tooling.
You're right; but unfortunately, the Manila contributor team is small and it needs help maintaining support for "standalone" NFS-Ganesha. We've struggled to keep CI and testing alive for these past few releases. I understand this is a popular choice, but, integrating with cephadm deployed NFS allows a good number of improvements that users seem to desire: 1) Active/Active NFS-Ganesha clusters provide higher availability than Active/Passive deployments of "standalone" NFS-Ganesha 2) NFS exports management is done natively through Ceph APIs, instead of using DBUS API within Manila. Using DBUS API restricted the architecture, and deployers seemed to choose to collocate Manila's share manager service with the NFS service to share the DBUS socket between the two processes. 3) NFS exports management with Standalone NFS-Ganesha required some book-keeping within Manila, and it does poorly with burst activity A lot of NFS-Ganesha's development has also shifted focus to this clustering approach, and there are more HA improvements planned here.
Looking at the last Ceph user survey [0] there many choices on how to deploy and operate Ceph - the Ceph orchestrator does account for 30% but other popular tools are also well represented:
The survey linked seems to be 3 years old. Currently, Cephadm and Rook are the Ceph community's preference of installers. AFAIU, aligning on these deployment tools has been a long term vision within that community. All other installation methods that used to be maintained by the community are being deprecated or silently going unmaintained. Like you pointed out above however, Rook deployed NFS clusters cannot be used with Manila's CephFS driver. This is a feature gap we intend to close by working with the Ceph community.
Removing this feature would challenge our ability to deliver Ceph integrated Manila through the Sunbeam project.
Is this a done deal or can we still consider not removing this helper for the ceph drivers going forward?
We could certainly keep the helper around if there's someone that can assist or take over maintenance of it. At the least, we need to keep the current devstack based CI job [3] running. My suspicion is that over time things could break because of changes within CephFS or NFS-Ganesha. [3] https://review.opendev.org/c/openstack/manila-tempest-plugin/+/939908
Thanks
James
[0] https://ceph.io/en/news/blog/2022/ceph-user-survey-results-2022/images/ceph-...