Hello everyone,
Here are a few highlights on the TripleO Ceph integration status and the plans for the next cycle.
1. Deployed ceph as a separate step
TripleO now provides a different stage (with a new cli set of commands) to bootstrap the Ceph cluster before reaching the overcloud deployment phase.
This
is the new default approach since Wallaby, and the short term plan is
to work on the upstream CI consolidation to make sure we run this stage
on
the existing TripleO standalone scenarios, extending the coverage
to both phases (before the overcloud deployment, when the Ceph cluster
is created,
and during the overcloud deployment, when the cluster is finalized according to the enabled services).
It's
worth mentioning that great progress in this direction has been made,
and the collaboration with the tripleo-ci is one of the key points here,
as
they're helping on the automation aspect to test upstream pending bits with daily jobs.
The next step will be working together on the automation of the promotion mechanism, which is supposed to make this process less error-prone.
2. Decouple Ceph Upgrades
Nautilus to Pacific is still managed by ceph-ansible but the stage of upgrading the cluster has been moved before the overcloud upgrade, resulting in
a different maintenance window.
Once
the cluster is moved to Pacific, cephadm is enabled, and from this
moment onwards, the upgrade process, as well as minor updates, will be
managed
by cephadm and can be seen as a day2 operation.
The
operator can now perform these kinds of tasks without any interaction
with TripleO, which is still used to pull the new containers (unless
another
registry reachable from the overcloud is used), but the scope has been limited.
3. Ganesha transitioning to Ceph orchestrator and Ingress migration
This
has been the main topic for this first ptg session: the feature it's
tracked by two already approved upstream specs and the goal is to
support a
Ganesha service managed by cephadm instead of a tripleo-managed one.
The TripleO conversation impacted many areas:
b.
a new TripleO resource, the CephIngress daemon, has been added, and
it's a key component (provided by Ceph) that is supposed to provide HA
for the
ceph-nfs managed daemon
c. The tripleo cli is extended and the ceph-nfs daemon can be deployed during the bootstrap of the ceph cluster
d.
This feature depends on the manila driver development [1], which
represents an effort to implement a driver that can interact with the
Ceph orch cli
(and the layer it provides for nfs) instead of using dbus.
Further information about this conversation can be found here [1].
Part
of this conversation (and really good input here actually) was about
the migration plan for already existing environments where operators
would like
to move from a TripleO managed Ganesha to a highly available ceph-nfs managed by cephadm.
The outcome here is:
1.
It's possible to move to the cephadm managed ingress daemon during the
upgrade under certain constraints, but we should provide tools to
finalize the
migration because there's an impact not only on the
server-side (and the manila service itself) but also on the clients
where the shares are mounted;
2.
We might want to have options to keep the PCS managed VIP for Ganesha
and avoid forcing people to migrate, and this flow should be consistent
at tripleo
heat templates level;
For those who are interested, here's the etherpad [2] and the recording of the session [3].
Thanks,
Francesco
[1] https://etherpad.opendev.org/p/zorilla-ptg-manila-planning
[2] https://etherpad.opendev.org/p/tripleo-zed-ceph
[3] https://slagle.fedorapeople.org/tripleo-zed-ptg/tripleo-zed-ptg-ceph.mp4