[nova][neutron][oslo][ops] rabbit bindings issue
thierry at openstack.org
Tue Aug 11 10:24:00 UTC 2020
If you can reproduce it with current versions, I would suggest to file
an issue on https://github.com/rabbitmq/rabbitmq-server/issues/
The behavior you describe seems to match
https://github.com/rabbitmq/rabbitmq-server/issues/1873 but the
maintainers seem to think it's been fixed by a number of
somewhat-related changes in 3.7.13, because nobody reported issues
Fabian Zimmermann wrote:
> dont know if durable queues help, but should be enabled by rabbitmq
> policy which (alone) doesnt seem to fix this (we have this active)
> Massimo Sgaravatto <massimo.sgaravatto at gmail.com
> <mailto:massimo.sgaravatto at gmail.com>> schrieb am Sa., 8. Aug. 2020, 09:36:
> We also see the issue. When it happens stopping and restarting the
> rabbit cluster usually helps.
> I thought the problem was because of a wrong setting in the
> openstack services conf files: I missed these settings (that I am
> now going to add):
> rabbit_ha_queues = true
> amqp_durable_queues = true
> Cheers, Massimo
> On Sat, Aug 8, 2020 at 6:34 AM Fabian Zimmermann <dev.faz at gmail.com
> <mailto:dev.faz at gmail.com>> wrote:
> we also have this issue.
> Our solution was (up to now) to delete the queues with a script
> or even reset the complete cluster.
> We just upgraded rabbitmq to the latest version - without luck.
> Anyone else seeing this issue?
> Arnaud Morin <arnaud.morin at gmail.com
> <mailto:arnaud.morin at gmail.com>> schrieb am Do., 6. Aug. 2020,
> Hey all,
> I would like to ask the community about a rabbit issue we
> have from time
> to time.
> In our current architecture, we have a cluster of rabbits (3
> nodes) for
> all our OpenStack services (mostly nova and neutron).
> When one node of this cluster is down, the cluster continue
> working (we
> use pause_minority strategy).
> But, sometimes, the third server is not able to recover
> and need a manual intervention.
> After this intervention, we restart the rabbitmq-server
> process, which
> is then able to join the cluster back.
> At this time, the cluster looks ok, everything is fine.
> BUT, nothing works.
> Neutron and nova agents are not able to report back to servers.
> They appear dead.
> Servers seems not being able to consume messages.
> The exchanges, queues, bindings seems good in rabbit.
> What we see is that removing bindings (using rabbitmqadmin
> binding or the web interface) and recreate them again (using
> the same
> routing key) brings the service back up and running.
> Doing this for all queues is really painful. Our next plan is to
> automate it, but is there anyone in the community already
> saw this kind
> of issues?
> Our bug looks like the one described in .
> Someone recommands to create an Alternate Exchange.
> Is there anyone already tried that?
> FYI, we are running rabbit 3.8.2 (with OpenStack Stein).
> We had the same kind of issues using older version of rabbit.
> Thanks for your help.
> Arnaud Morin
Thierry Carrez (ttx)
More information about the openstack-discuss