]ops] Something wrong with rabbit settings

Bogdan Dobrelya bdobreli at redhat.com
Thu Jul 22 09:12:07 UTC 2021


On 7/12/21 9:20 AM, Massimo Sgaravatto wrote:
> Thanks a lot for your replies
> 
> I indeed forgot to say that I am using durable queues (i.e I set 
> amqp_durable_queues = true in the conf files of the OpenStack services).
> 
> I'll investigate further about the root cause of these network partitions, but I 
> implemented this rabbit cluster exactly to be able to manage such scenarios ...
> Looks like I can have a much more reliable system with a single rabbit instance ...
> 
> Moreover: is it normal/expected that it doesn't recover itself  ?

There is pacemaker OCF RA [0] that automatically recovers from network 
partitions, mostly by resetting the Mnesia DB of failed nodes that cannot join.

[0] https://www.rabbitmq.com/pacemaker.html#auto-pacemaker

> 
> Thanks, Massimo
> 
> On Fri, Jul 9, 2021 at 4:21 PM Fabian Zimmermann <dev.faz at gmail.com 
> <mailto:dev.faz at gmail.com>> wrote:
> 
>     Hi,
> 
>     Am Fr., 9. Juli 2021 um 16:04 Uhr schrieb Sean Mooney <smooney at redhat.com
>     <mailto:smooney at redhat.com>>:
> 
>      > at lwast form a nova perspective if we send an cast for example from the api
>      > its lost then we wont try to recover.
>      >
>      > in the case of an rpc call then the timeout will fire and we will fail
>     whatever operation we were doing
> 
>     well its a lot better to have consistent state with a limited amount
>     of failed requests, than having an whole cluster stuck and it normally
>     affects only a limited (if any!) requests at all.
>     So I personally prefer - fail fast and restore :)
> 
>       Fabian
> 


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando




More information about the openstack-discuss mailing list