Re: ]ops] Something wrong with rabbit settings

12 Jul 2021

      Thanks a lot for your replies

I indeed forgot to say that I am using durable queues (i.e I set
amqp_durable_queues = true in the conf files of the OpenStack services).

I'll investigate further about the root cause of these network partitions,
but I implemented this rabbit cluster exactly to be able to manage such
scenarios ...
Looks like I can have a much more reliable system with a single rabbit
instance ...

Moreover: is it normal/expected that it doesn't recover itself  ?

Thanks, Massimo

On Fri, Jul 9, 2021 at 4:21 PM Fabian Zimmermann <dev.faz@gmail.com> wrote:
...
Hi,
Am Fr., 9. Juli 2021 um 16:04 Uhr schrieb Sean Mooney <smooney@redhat.com
...
:
...
at lwast form a nova perspective if we send an cast for example from the
api
its lost then we wont try to recover.
in the case of an rpc call then the timeout will fire and we will fail
whatever operation we were doing
well its a lot better to have consistent state with a limited amount
of failed requests, than having an whole cluster stuck and it normally
affects only a limited (if any!) requests at all.
So I personally prefer - fail fast and restore :)
Fabian

Re: ]ops] Something wrong with rabbit settings

Massimo Sgaravatto