Hello everyone,
Here are a few highlights on the TripleO Ceph integration status and the plans for the next cycle.


1. Deployed ceph as a separate step

TripleO
 now provides a different stage (with a new cli set of commands) to 
bootstrap the Ceph cluster before reaching the overcloud deployment 
phase.
This
 is the new default approach since Wallaby, and the short term plan is 
to work on the upstream CI consolidation to make sure we run this stage 
on
 the existing TripleO standalone scenarios, extending the coverage
 to both phases (before the overcloud deployment, when the Ceph cluster 
is created,
 and during the overcloud deployment, when the cluster is finalized according to the enabled services).
It's
 worth mentioning that great progress in this direction has been made, 
and the collaboration with the tripleo-ci is one of the key points here,
 as
 they're helping on the automation aspect to test upstream pending bits with daily jobs.
The
 next step will be working together on the automation of the promotion 
mechanism, which is supposed to make this process less error-prone.


2. Decouple Ceph Upgrades

Nautilus
 to Pacific is still managed 
by ceph-ansible but the stage of upgrading the cluster has been moved before the overcloud upgrade, resulting in
a different maintenance window.
Once
 the cluster is moved to Pacific, cephadm is enabled, and from this 
moment onwards, the upgrade process, as well as minor updates, will be 
managed
 by cephadm and can be seen as a day2 operation.
The
 operator can now perform these kinds of tasks without any interaction 
with TripleO, which is still used to pull the new containers (unless 
another
 registry reachable from the overcloud is used), but the scope has been limited.


3. Ganesha transitioning to Ceph orchestrator and Ingress migration

This
 has been the main topic for this first ptg session: the feature it's 
tracked by two already approved upstream specs and the goal is to 
support a
 Ganesha service managed by cephadm instead of a tripleo-managed one.

The TripleO conversation impacted many areas:

a.
 the networkv2 flow has been improved and it's now possible to reserve 
more than 1 VIP per network, but it applies only to the ceph services;
b.
 a new TripleO resource, the CephIngress daemon, has been added, and 
it's a key component (provided by Ceph) that is supposed to provide HA 
for the
   ceph-nfs managed daemon

c. The tripleo cli is extended and the ceph-nfs daemon can be deployed during the bootstrap of the ceph cluster

d.
 This feature depends on the manila driver development [1], which 
represents an effort to implement a driver that can interact with the 
Ceph orch cli
   (and the layer it provides for nfs) instead of using dbus.
   Further information about this conversation can be found here [1].

Part
 of this conversation (and really good input here actually) was about 
the migration plan for already existing environments where operators 
would like
 to move from a TripleO managed Ganesha to a highly available ceph-nfs managed by cephadm.

The outcome here is:

1.
 It's possible to move to the cephadm managed ingress daemon during the 
upgrade under certain constraints, but we should provide tools  to 
finalize the
   migration because there's an impact not only on the 
server-side (and the manila service itself) but also on the clients 
where the shares are mounted;

2.
 We might want to have options to keep the PCS managed VIP for Ganesha 
and avoid forcing people to migrate, and this flow should be consistent 
at tripleo
   heat templates level;


For those who are interested, here's the etherpad [2] and the recording of the session [3].


Thanks,
Francesco

[1] https://etherpad.opendev.org/p/zorilla-ptg-manila-planning
[2] https://etherpad.opendev.org/p/tripleo-zed-ceph
[3] https://slagle.fedorapeople.org/tripleo-zed-ptg/tripleo-zed-ptg-ceph.mp4
Francesco Pantano
GPG KEY: F41BD75C