[nova][neutron][oslo][ops] rabbit bindings issue

Thierry Carrez thierry at openstack.org
Tue Aug 11 10:24:00 UTC 2020


If you can reproduce it with current versions, I would suggest to file 
an issue on https://github.com/rabbitmq/rabbitmq-server/issues/

The behavior you describe seems to match 
https://github.com/rabbitmq/rabbitmq-server/issues/1873 but the 
maintainers seem to think it's been fixed by a number of 
somewhat-related changes in 3.7.13, because nobody reported issues 
anymore :)

Fabian Zimmermann wrote:
> Hi,
> 
> dont know if durable queues help, but should be enabled by rabbitmq 
> policy which (alone) doesnt seem to fix this (we have this active)
> 
>   Fabian
> 
> Massimo Sgaravatto <massimo.sgaravatto at gmail.com 
> <mailto:massimo.sgaravatto at gmail.com>> schrieb am Sa., 8. Aug. 2020, 09:36:
> 
>     We also see the issue.  When it happens stopping and restarting the
>     rabbit cluster usually helps.
> 
>     I thought the problem was because of a wrong setting in the
>     openstack services conf files: I missed these settings (that I am
>     now going to add):
> 
>     [oslo_messaging_rabbit]
>     rabbit_ha_queues = true
>     amqp_durable_queues = true
> 
>     Cheers, Massimo
> 
> 
>     On Sat, Aug 8, 2020 at 6:34 AM Fabian Zimmermann <dev.faz at gmail.com
>     <mailto:dev.faz at gmail.com>> wrote:
> 
>         Hi,
> 
>         we also have this issue.
> 
>         Our solution was (up to now) to delete the queues with a script
>         or even reset the complete cluster.
> 
>         We just upgraded rabbitmq to the latest version - without luck.
> 
>         Anyone else seeing this issue?
> 
>           Fabian
> 
> 
> 
>         Arnaud Morin <arnaud.morin at gmail.com
>         <mailto:arnaud.morin at gmail.com>> schrieb am Do., 6. Aug. 2020,
>         16:47:
> 
>             Hey all,
> 
>             I would like to ask the community about a rabbit issue we
>             have from time
>             to time.
> 
>             In our current architecture, we have a cluster of rabbits (3
>             nodes) for
>             all our OpenStack services (mostly nova and neutron).
> 
>             When one node of this cluster is down, the cluster continue
>             working (we
>             use pause_minority strategy).
>             But, sometimes, the third server is not able to recover
>             automatically
>             and need a manual intervention.
>             After this intervention, we restart the rabbitmq-server
>             process, which
>             is then able to join the cluster back.
> 
>             At this time, the cluster looks ok, everything is fine.
>             BUT, nothing works.
>             Neutron and nova agents are not able to report back to servers.
>             They appear dead.
>             Servers seems not being able to consume messages.
>             The exchanges, queues, bindings seems good in rabbit.
> 
>             What we see is that removing bindings (using rabbitmqadmin
>             delete
>             binding or the web interface) and recreate them again (using
>             the same
>             routing key) brings the service back up and running.
> 
>             Doing this for all queues is really painful. Our next plan is to
>             automate it, but is there anyone in the community already
>             saw this kind
>             of issues?
> 
>             Our bug looks like the one described in [1].
>             Someone recommands to create an Alternate Exchange.
>             Is there anyone already tried that?
> 
>             FYI, we are running rabbit 3.8.2 (with OpenStack Stein).
>             We had the same kind of issues using older version of rabbit.
> 
>             Thanks for your help.
> 
>             [1]
>             https://groups.google.com/forum/#!newtopic/rabbitmq-users/rabbitmq-users/zFhmpHF2aWk
> 
>             -- 
>             Arnaud Morin
> 
> 


-- 
Thierry Carrez (ttx)



More information about the openstack-discuss mailing list