Hi, dont know if durable queues help, but should be enabled by rabbitmq policy which (alone) doesnt seem to fix this (we have this active) Fabian Massimo Sgaravatto <massimo.sgaravatto@gmail.com> schrieb am Sa., 8. Aug. 2020, 09:36:
We also see the issue. When it happens stopping and restarting the rabbit cluster usually helps.
I thought the problem was because of a wrong setting in the openstack services conf files: I missed these settings (that I am now going to add):
[oslo_messaging_rabbit] rabbit_ha_queues = true amqp_durable_queues = true
Cheers, Massimo
On Sat, Aug 8, 2020 at 6:34 AM Fabian Zimmermann <dev.faz@gmail.com> wrote:
Hi,
we also have this issue.
Our solution was (up to now) to delete the queues with a script or even reset the complete cluster.
We just upgraded rabbitmq to the latest version - without luck.
Anyone else seeing this issue?
Fabian
Arnaud Morin <arnaud.morin@gmail.com> schrieb am Do., 6. Aug. 2020, 16:47:
Hey all,
I would like to ask the community about a rabbit issue we have from time to time.
In our current architecture, we have a cluster of rabbits (3 nodes) for all our OpenStack services (mostly nova and neutron).
When one node of this cluster is down, the cluster continue working (we use pause_minority strategy). But, sometimes, the third server is not able to recover automatically and need a manual intervention. After this intervention, we restart the rabbitmq-server process, which is then able to join the cluster back.
At this time, the cluster looks ok, everything is fine. BUT, nothing works. Neutron and nova agents are not able to report back to servers. They appear dead. Servers seems not being able to consume messages. The exchanges, queues, bindings seems good in rabbit.
What we see is that removing bindings (using rabbitmqadmin delete binding or the web interface) and recreate them again (using the same routing key) brings the service back up and running.
Doing this for all queues is really painful. Our next plan is to automate it, but is there anyone in the community already saw this kind of issues?
Our bug looks like the one described in [1]. Someone recommands to create an Alternate Exchange. Is there anyone already tried that?
FYI, we are running rabbit 3.8.2 (with OpenStack Stein). We had the same kind of issues using older version of rabbit.
Thanks for your help.
[1] https://groups.google.com/forum/#!newtopic/rabbitmq-users/rabbitmq-users/zFh...
-- Arnaud Morin