<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">imho, the same host(b<span style="font-family:Arial,Helvetica,sans-serif">ackend_host) are resolving the different issues with coordination.</span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif"><br></span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif">The same host(backend_host) makes multiple cinder-volume work in active/active mode by leveraging rabbitmq</span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif">queue with multi consumers. one cinder-volume death doesn't affect anything.</span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif"><br></span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif">The coordination just prevents the same resource(image) from being accessed by multi cinder-volume or eventlet in</span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif">one cinder-volume. I.e  one image can not be snapshot during deleting. This is beyond active-active. </span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif"><br></span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif">So I think coordination is required for all drivers, including ceph. kolla should add this in default.</span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif"><br></span></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Nov 18, 2020 at 5:06 AM Giulio Fidente <<a href="mailto:gfidente@redhat.com">gfidente@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I am leaving some comments inline and adding on CC some cinder folks who<br>

know better<br>

<br>

On 11/17/20 9:27 PM, Radosław Piliszek wrote:<br>

> Dear Cinder Masters,<br>

> <br>

> I have a question for you. (or two, or several; well, actually the<br>

> whole Kolla team has :-) )<br>

> <br>

> The background is that Kolla has been happily deploying multinode<br>

> cinder-volume with Ceph RBD backend, with no coordination configured,<br>

> cluster parameter unset, host properly set per host and backend_host<br>

> normalised (as well as any other relevant config) between the<br>

> cinder-volume hosts.<br>

> <br>

> The first question is: do we correctly understand that this was an<br>

> active-active deployment? Or really something else?<br>

<br>

this configuration is similar to that deployed by tripleo, except<br>

tripleo would use pacemaker to have always a single cinder-volume running<br>

<br>

the reason being that, as far as I understand, without a coordinator the<br>

first cinder-volume within a given 'backend_host' group to consume the<br>

message from the amqp queue will start executing the task ... so if<br>

another task is queued (or is in progress), for the same volume, there<br>

is risk of data corruption<br>

<br>

> Now, there have been no reports that it misbehaved for anyone. It<br>

> certainly has not for any Kolla core. The fact is it was brought to<br>

> our attention because due to the drop of Kolla-deployed Ceph, the<br>

> recommendation to set backend_host was not present and users tripped<br>

> over non-uniform backend_host. And this is expected, of course.<br>

> <br>

> The second and final question is, building up on the first one, were<br>

> we doing it wrong all the time?<br>

> (plus extras: Why did it work? Were there any quirks? What should we do?)<br>

I think the correct setup for active/active should be<br>

<br>

- do not use same host or backend_host<br>

- do set cluster to same value across cluster members<br>

- use a coordinator<br>

<br>

> PS: Please let me know if this thought process is actually<br>

> Ceph-independent as well.<br>

I don't think it's Ceph dependent, my understanding is that<br>

active/active is only possible with some drivers because not every<br>

driver is safe to use in active/active configuration; some can, for<br>

example, have issues handling the database<br>

<br>

Ceph is just one of those drivers which behaves correctly in<br>

active/active configuration<br>

-- <br>

Giulio Fidente<br>

GPG KEY: 08D733BA<br>

<br>

<br>

</blockquote></div>