[TripleO][Ceph] Zed PTG Summary
Hello everyone, Here are a few highlights on the TripleO Ceph integration status and the plans for the next cycle. *1. Deployed ceph as a separate step* TripleO now provides a different stage (with a new cli set of commands) to bootstrap the Ceph cluster before reaching the overcloud deployment phase. This is the new default approach since Wallaby, and the short term plan is to work on the upstream CI consolidation to make sure we run this stage on the existing TripleO standalone scenarios, extending the coverage to both phases (before the overcloud deployment, when the Ceph cluster is created, and during the overcloud deployment, when the cluster is finalized according to the enabled services). It's worth mentioning that great progress in this direction has been made, and the collaboration with the tripleo-ci is one of the key points here, as they're helping on the automation aspect to test upstream pending bits with daily jobs. The next step will be working together on the automation of the promotion mechanism, which is supposed to make this process less error-prone. *2. Decouple Ceph Upgrades* Nautilus to Pacific is still managed by ceph-ansible but the stage of upgrading the cluster has been moved before the overcloud upgrade, resulting in a different maintenance window. Once the cluster is moved to Pacific, cephadm is enabled, and from this moment onwards, the upgrade process, as well as minor updates, will be managed by cephadm and can be seen as a day2 operation. The operator can now perform these kinds of tasks without any interaction with TripleO, which is still used to pull the new containers (unless another registry reachable from the overcloud is used), but the scope has been limited. *3. Ganesha transitioning to Ceph orchestrator and Ingress migration* This has been the main topic for this first ptg session: the feature it's tracked by two already approved upstream specs and the goal is to support a Ganesha service managed by cephadm instead of a tripleo-managed one. The TripleO conversation impacted many areas: *a.* the networkv2 flow has been improved and it's now possible to reserve more than 1 VIP per network, but it applies only to the ceph services; *b.* a new TripleO resource, the CephIngress daemon, has been added, and it's a key component (provided by Ceph) that is supposed to provide HA for the ceph-nfs managed daemon *c.* The tripleo cli is extended and the ceph-nfs daemon can be deployed during the bootstrap of the ceph cluster *d.* This feature depends on the manila driver development [1], which represents an effort to implement a driver that can interact with the Ceph orch cli (and the layer it provides for nfs) instead of using dbus. Further information about this conversation can be found here [1]. Part of this conversation (and really good input here actually) was about the migration plan for already existing environments where operators would like to move from a TripleO managed Ganesha to a highly available ceph-nfs managed by cephadm. The outcome here is: *1.* It's possible to move to the cephadm managed ingress daemon during the upgrade under certain constraints, but we should provide tools to finalize the migration because there's an impact not only on the server-side (and the manila service itself) but also on the clients where the shares are mounted; *2.* We might want to have options to keep the PCS managed VIP for Ganesha and avoid forcing people to migrate, and this flow should be consistent at tripleo heat templates level; For those who are interested, here's the etherpad [2] and the recording of the session [3]. Thanks, Francesco [1] https://etherpad.opendev.org/p/zorilla-ptg-manila-planning [2] https://etherpad.opendev.org/p/tripleo-zed-ceph [3] https://slagle.fedorapeople.org/tripleo-zed-ptg/tripleo-zed-ptg-ceph.mp4 -- Francesco Pantano GPG KEY: F41BD75C
participants (1)
-
Francesco Pantano