[cinder] Ceph, active-active and no coordination
mark at stackhpc.com
Wed Nov 18 09:30:31 UTC 2020
On Tue, 17 Nov 2020 at 20:27, Radosław Piliszek
<radoslaw.piliszek at gmail.com> wrote:
> Dear Cinder Masters,
> I have a question for you. (or two, or several; well, actually the
> whole Kolla team has :-) )
Thanks for kicking off this thread, Radek.
> The background is that Kolla has been happily deploying multinode
> cinder-volume with Ceph RBD backend, with no coordination configured,
> cluster parameter unset, host properly set per host and backend_host
> normalised (as well as any other relevant config) between the
> cinder-volume hosts.
> The first question is: do we correctly understand that this was an
> active-active deployment? Or really something else?
> Now, there have been no reports that it misbehaved for anyone. It
> certainly has not for any Kolla core. The fact is it was brought to
> our attention because due to the drop of Kolla-deployed Ceph, the
> recommendation to set backend_host was not present and users tripped
> over non-uniform backend_host. And this is expected, of course.
Here is the bug report . It relates to using an externally deployed
Ceph cluster, rather than one deployed via Kolla Ansible.
To provide a little more background, in Train and earlier releases we
documented to set backend_host. From Ussuri, we automated more of the
Ceph configuration, and in the process dropped backend_host. It's not
Users upgrading to Ussuri from Train, and dropping their custom Cinder
config in favour of the Kolla automation would lose backend_host, and
therefore volumes would become unmanageable. A manual step is required
to move them to one of the cinder-volume hosts.
That bug caused us to question the active/active setup, especially
after finding a related OSA bug .
I can't find any Cinder admin guide for active/active configuration,
although there is a high level spec  (with linked sub-specs) and
some contributor docs  that outline the various problems.
> The second and final question is, building up on the first one, were
> we doing it wrong all the time?
> (plus extras: Why did it work? Were there any quirks? What should we do?)
> PS: Please let me know if this thought process is actually
> Ceph-independent as well.
More information about the openstack-discuss