[TripleO][Ceph] Zed PTG Summary

15 Apr 2022

      Hello everyone,

Here are a few highlights on the TripleO Ceph integration status and the
plans for the next cycle.

*1. Deployed ceph as a separate step*

TripleO now provides a different stage (with a new cli set of commands) to
bootstrap the Ceph cluster before reaching the overcloud deployment phase.

This is the new default approach since Wallaby, and the short term plan is
to work on the upstream CI consolidation to make sure we run this stage on
the existing TripleO standalone scenarios, extending the coverage to both
phases (before the overcloud deployment, when the Ceph cluster is created,
and during the overcloud deployment, when the cluster is finalized
according to the enabled services).

It's worth mentioning that great progress in this direction has been made,
and the collaboration with the tripleo-ci is one of the key points here, as
they're helping on the automation aspect to test upstream pending bits with
daily jobs.

The next step will be working together on the automation of the promotion
mechanism, which is supposed to make this process less error-prone.

*2. Decouple Ceph Upgrades*

Nautilus to Pacific is still managed by ceph-ansible but the stage of
upgrading the cluster has been moved before the overcloud upgrade,
resulting in

a different maintenance window.

Once the cluster is moved to Pacific, cephadm is enabled, and from this
moment onwards, the upgrade process, as well as minor updates, will be
managed
by cephadm and can be seen as a day2 operation.

The operator can now perform these kinds of tasks without any interaction
with TripleO, which is still used to pull the new containers (unless another
registry reachable from the overcloud is used), but the scope has been
limited.

*3. Ganesha transitioning to Ceph orchestrator and Ingress migration*

This has been the main topic for this first ptg session: the feature it's
tracked by two already approved upstream specs and the goal is to support a
Ganesha service managed by cephadm instead of a tripleo-managed one.

The TripleO conversation impacted many areas:

*a.* the networkv2 flow has been improved and it's now possible to reserve
more than 1 VIP per network, but it applies only to the ceph services;

*b.* a new TripleO resource, the CephIngress daemon, has been added, and
it's a key component (provided by Ceph) that is supposed to provide HA for
the
   ceph-nfs managed daemon

*c.* The tripleo cli is extended and the ceph-nfs daemon can be deployed
during the bootstrap of the ceph cluster

*d.* This feature depends on the manila driver development [1], which
represents an effort to implement a driver that can interact with the Ceph
orch cli
   (and the layer it provides for nfs) instead of using dbus.

   Further information about this conversation can be found here [1].

Part of this conversation (and really good input here actually) was about
the migration plan for already existing environments where operators would
like
to move from a TripleO managed Ganesha to a highly available ceph-nfs
managed by cephadm.

The outcome here is:

*1.* It's possible to move to the cephadm managed ingress daemon during the
upgrade under certain constraints, but we should provide tools to finalize
the
   migration because there's an impact not only on the server-side (and the
manila service itself) but also on the clients where the shares are mounted;

*2.* We might want to have options to keep the PCS managed VIP for Ganesha
and avoid forcing people to migrate, and this flow should be consistent at
tripleo
   heat templates level;

For those who are interested, here's the etherpad [2] and the recording of
the session [3].

Thanks,

Francesco

[1] https://etherpad.opendev.org/p/zorilla-ptg-manila-planning

[2] https://etherpad.opendev.org/p/tripleo-zed-ceph

[3] https://slagle.fedorapeople.org/tripleo-zed-ptg/tripleo-zed-ptg-ceph.mp4

-- 
Francesco Pantano
GPG KEY: F41BD75C

Francesco Pantano

tags

participants (1)