[cinder] Ceph, active-active and no coordination

Giulio Fidente gfidente at redhat.com
Tue Nov 17 21:03:10 UTC 2020


I am leaving some comments inline and adding on CC some cinder folks who
know better

On 11/17/20 9:27 PM, Radosław Piliszek wrote:
> Dear Cinder Masters,
> 
> I have a question for you. (or two, or several; well, actually the
> whole Kolla team has :-) )
> 
> The background is that Kolla has been happily deploying multinode
> cinder-volume with Ceph RBD backend, with no coordination configured,
> cluster parameter unset, host properly set per host and backend_host
> normalised (as well as any other relevant config) between the
> cinder-volume hosts.
> 
> The first question is: do we correctly understand that this was an
> active-active deployment? Or really something else?

this configuration is similar to that deployed by tripleo, except
tripleo would use pacemaker to have always a single cinder-volume running

the reason being that, as far as I understand, without a coordinator the
first cinder-volume within a given 'backend_host' group to consume the
message from the amqp queue will start executing the task ... so if
another task is queued (or is in progress), for the same volume, there
is risk of data corruption

> Now, there have been no reports that it misbehaved for anyone. It
> certainly has not for any Kolla core. The fact is it was brought to
> our attention because due to the drop of Kolla-deployed Ceph, the
> recommendation to set backend_host was not present and users tripped
> over non-uniform backend_host. And this is expected, of course.
> 
> The second and final question is, building up on the first one, were
> we doing it wrong all the time?
> (plus extras: Why did it work? Were there any quirks? What should we do?)
I think the correct setup for active/active should be

- do not use same host or backend_host
- do set cluster to same value across cluster members
- use a coordinator

> PS: Please let me know if this thought process is actually
> Ceph-independent as well.
I don't think it's Ceph dependent, my understanding is that
active/active is only possible with some drivers because not every
driver is safe to use in active/active configuration; some can, for
example, have issues handling the database

Ceph is just one of those drivers which behaves correctly in
active/active configuration
-- 
Giulio Fidente
GPG KEY: 08D733BA




More information about the openstack-discuss mailing list