<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">imho, the same host(b<span style="font-family:Arial,Helvetica,sans-serif">ackend_host) are resolving the different issues with coordination.</span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif"><br></span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif">The same host(backend_host) makes multiple cinder-volume work in active/active mode by leveraging rabbitmq</span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif">queue with multi consumers. one cinder-volume death doesn't affect anything.</span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif"><br></span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif">The coordination just prevents the same resource(image) from being accessed by multi cinder-volume or eventlet in</span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif">one cinder-volume. I.e one image can not be snapshot during deleting. This is beyond active-active. </span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif"><br></span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif">So I think coordination is required for all drivers, including ceph. kolla should add this in default.</span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:Arial,Helvetica,sans-serif"><br></span></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Nov 18, 2020 at 5:06 AM Giulio Fidente <<a href="mailto:gfidente@redhat.com">gfidente@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I am leaving some comments inline and adding on CC some cinder folks who<br>
know better<br>
<br>
On 11/17/20 9:27 PM, Radosław Piliszek wrote:<br>
> Dear Cinder Masters,<br>
> <br>
> I have a question for you. (or two, or several; well, actually the<br>
> whole Kolla team has :-) )<br>
> <br>
> The background is that Kolla has been happily deploying multinode<br>
> cinder-volume with Ceph RBD backend, with no coordination configured,<br>
> cluster parameter unset, host properly set per host and backend_host<br>
> normalised (as well as any other relevant config) between the<br>
> cinder-volume hosts.<br>
> <br>
> The first question is: do we correctly understand that this was an<br>
> active-active deployment? Or really something else?<br>
<br>
this configuration is similar to that deployed by tripleo, except<br>
tripleo would use pacemaker to have always a single cinder-volume running<br>
<br>
the reason being that, as far as I understand, without a coordinator the<br>
first cinder-volume within a given 'backend_host' group to consume the<br>
message from the amqp queue will start executing the task ... so if<br>
another task is queued (or is in progress), for the same volume, there<br>
is risk of data corruption<br>
<br>
> Now, there have been no reports that it misbehaved for anyone. It<br>
> certainly has not for any Kolla core. The fact is it was brought to<br>
> our attention because due to the drop of Kolla-deployed Ceph, the<br>
> recommendation to set backend_host was not present and users tripped<br>
> over non-uniform backend_host. And this is expected, of course.<br>
> <br>
> The second and final question is, building up on the first one, were<br>
> we doing it wrong all the time?<br>
> (plus extras: Why did it work? Were there any quirks? What should we do?)<br>
I think the correct setup for active/active should be<br>
<br>
- do not use same host or backend_host<br>
- do set cluster to same value across cluster members<br>
- use a coordinator<br>
<br>
> PS: Please let me know if this thought process is actually<br>
> Ceph-independent as well.<br>
I don't think it's Ceph dependent, my understanding is that<br>
active/active is only possible with some drivers because not every<br>
driver is safe to use in active/active configuration; some can, for<br>
example, have issues handling the database<br>
<br>
Ceph is just one of those drivers which behaves correctly in<br>
active/active configuration<br>
-- <br>
Giulio Fidente<br>
GPG KEY: 08D733BA<br>
<br>
<br>
</blockquote></div>