<div dir="ltr"><div dir="ltr"><div dir="ltr">Thanks a lot for your replies<div><br><div><div>I indeed forgot to say that I am using durable queues (i.e I set amqp_durable_queues = true in the conf files of the OpenStack services).<br></div><div><br></div><div>I'll investigate further about the root cause of these network partitions, but I implemented this rabbit cluster exactly to be able to manage such scenarios ...</div><div>Looks like I can have a much more reliable system with a single rabbit instance ...</div><div><br></div><div>Moreover: is it normal/expected that it doesn't recover itself  ?</div></div></div><div><br></div><div>Thanks, Massimo</div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jul 9, 2021 at 4:21 PM Fabian Zimmermann <<a href="mailto:dev.faz@gmail.com">dev.faz@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>

<br>

Am Fr., 9. Juli 2021 um 16:04 Uhr schrieb Sean Mooney <<a href="mailto:smooney@redhat.com" target="_blank">smooney@redhat.com</a>>:<br>

<br>

> at lwast form a nova perspective if we send an cast for example from the api<br>

> its lost then we wont try to recover.<br>

><br>

> in the case of an rpc call then the timeout will fire and we will fail whatever operation we were doing<br>

<br>

well its a lot better to have consistent state with a limited amount<br>

of failed requests, than having an whole cluster stuck and it normally<br>

affects only a limited (if any!) requests at all.<br>

So I personally prefer - fail fast and restore :)<br>

<br>

 Fabian<br>

</blockquote></div></div>