[nova][neutron][oslo][ops] rabbit bindings issue
Thierry Carrez
thierry at openstack.org
Tue Aug 11 10:24:00 UTC 2020
If you can reproduce it with current versions, I would suggest to file
an issue on https://github.com/rabbitmq/rabbitmq-server/issues/
The behavior you describe seems to match
https://github.com/rabbitmq/rabbitmq-server/issues/1873 but the
maintainers seem to think it's been fixed by a number of
somewhat-related changes in 3.7.13, because nobody reported issues
anymore :)
Fabian Zimmermann wrote:
> Hi,
>
> dont know if durable queues help, but should be enabled by rabbitmq
> policy which (alone) doesnt seem to fix this (we have this active)
>
> Fabian
>
> Massimo Sgaravatto <massimo.sgaravatto at gmail.com
> <mailto:massimo.sgaravatto at gmail.com>> schrieb am Sa., 8. Aug. 2020, 09:36:
>
> We also see the issue. When it happens stopping and restarting the
> rabbit cluster usually helps.
>
> I thought the problem was because of a wrong setting in the
> openstack services conf files: I missed these settings (that I am
> now going to add):
>
> [oslo_messaging_rabbit]
> rabbit_ha_queues = true
> amqp_durable_queues = true
>
> Cheers, Massimo
>
>
> On Sat, Aug 8, 2020 at 6:34 AM Fabian Zimmermann <dev.faz at gmail.com
> <mailto:dev.faz at gmail.com>> wrote:
>
> Hi,
>
> we also have this issue.
>
> Our solution was (up to now) to delete the queues with a script
> or even reset the complete cluster.
>
> We just upgraded rabbitmq to the latest version - without luck.
>
> Anyone else seeing this issue?
>
> Fabian
>
>
>
> Arnaud Morin <arnaud.morin at gmail.com
> <mailto:arnaud.morin at gmail.com>> schrieb am Do., 6. Aug. 2020,
> 16:47:
>
> Hey all,
>
> I would like to ask the community about a rabbit issue we
> have from time
> to time.
>
> In our current architecture, we have a cluster of rabbits (3
> nodes) for
> all our OpenStack services (mostly nova and neutron).
>
> When one node of this cluster is down, the cluster continue
> working (we
> use pause_minority strategy).
> But, sometimes, the third server is not able to recover
> automatically
> and need a manual intervention.
> After this intervention, we restart the rabbitmq-server
> process, which
> is then able to join the cluster back.
>
> At this time, the cluster looks ok, everything is fine.
> BUT, nothing works.
> Neutron and nova agents are not able to report back to servers.
> They appear dead.
> Servers seems not being able to consume messages.
> The exchanges, queues, bindings seems good in rabbit.
>
> What we see is that removing bindings (using rabbitmqadmin
> delete
> binding or the web interface) and recreate them again (using
> the same
> routing key) brings the service back up and running.
>
> Doing this for all queues is really painful. Our next plan is to
> automate it, but is there anyone in the community already
> saw this kind
> of issues?
>
> Our bug looks like the one described in [1].
> Someone recommands to create an Alternate Exchange.
> Is there anyone already tried that?
>
> FYI, we are running rabbit 3.8.2 (with OpenStack Stein).
> We had the same kind of issues using older version of rabbit.
>
> Thanks for your help.
>
> [1]
> https://groups.google.com/forum/#!newtopic/rabbitmq-users/rabbitmq-users/zFhmpHF2aWk
>
> --
> Arnaud Morin
>
>
--
Thierry Carrez (ttx)
More information about the openstack-discuss
mailing list