[cinder] Ceph, active-active and no coordination

Jeffrey Zhang zhang.lei.fly+os-discuss at gmail.com
Wed Nov 18 06:38:08 UTC 2020

imho, the same host(backend_host) are resolving the different issues with

The same host(backend_host) makes multiple cinder-volume work in
active/active mode by leveraging rabbitmq
queue with multi consumers. one cinder-volume death doesn't affect anything.

The coordination just prevents the same resource(image) from being accessed
by multi cinder-volume or eventlet in
one cinder-volume. I.e  one image can not be snapshot during deleting. This
is beyond active-active.

So I think coordination is required for all drivers, including ceph. kolla
should add this in default.

On Wed, Nov 18, 2020 at 5:06 AM Giulio Fidente <gfidente at redhat.com> wrote:

> I am leaving some comments inline and adding on CC some cinder folks who
> know better
> On 11/17/20 9:27 PM, Radosław Piliszek wrote:
> > Dear Cinder Masters,
> >
> > I have a question for you. (or two, or several; well, actually the
> > whole Kolla team has :-) )
> >
> > The background is that Kolla has been happily deploying multinode
> > cinder-volume with Ceph RBD backend, with no coordination configured,
> > cluster parameter unset, host properly set per host and backend_host
> > normalised (as well as any other relevant config) between the
> > cinder-volume hosts.
> >
> > The first question is: do we correctly understand that this was an
> > active-active deployment? Or really something else?
> this configuration is similar to that deployed by tripleo, except
> tripleo would use pacemaker to have always a single cinder-volume running
> the reason being that, as far as I understand, without a coordinator the
> first cinder-volume within a given 'backend_host' group to consume the
> message from the amqp queue will start executing the task ... so if
> another task is queued (or is in progress), for the same volume, there
> is risk of data corruption
> > Now, there have been no reports that it misbehaved for anyone. It
> > certainly has not for any Kolla core. The fact is it was brought to
> > our attention because due to the drop of Kolla-deployed Ceph, the
> > recommendation to set backend_host was not present and users tripped
> > over non-uniform backend_host. And this is expected, of course.
> >
> > The second and final question is, building up on the first one, were
> > we doing it wrong all the time?
> > (plus extras: Why did it work? Were there any quirks? What should we do?)
> I think the correct setup for active/active should be
> - do not use same host or backend_host
> - do set cluster to same value across cluster members
> - use a coordinator
> > PS: Please let me know if this thought process is actually
> > Ceph-independent as well.
> I don't think it's Ceph dependent, my understanding is that
> active/active is only possible with some drivers because not every
> driver is safe to use in active/active configuration; some can, for
> example, have issues handling the database
> Ceph is just one of those drivers which behaves correctly in
> active/active configuration
> --
> Giulio Fidente
> GPG KEY: 08D733BA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20201118/18f1266b/attachment.html>

More information about the openstack-discuss mailing list