[nova][neutron][oslo][ops] rabbit bindings issue

Fabian Zimmermann dev.faz at gmail.com
Sat Aug 8 13:06:36 UTC 2020


Hi,

dont know if durable queues help, but should be enabled by rabbitmq policy
which (alone) doesnt seem to fix this (we have this active)

 Fabian

Massimo Sgaravatto <massimo.sgaravatto at gmail.com> schrieb am Sa., 8. Aug.
2020, 09:36:

> We also see the issue.  When it happens stopping and restarting the rabbit
> cluster usually helps.
>
> I thought the problem was because of a wrong setting in the openstack
> services conf files: I missed these settings (that I am now going to add):
>
> [oslo_messaging_rabbit]
> rabbit_ha_queues = true
> amqp_durable_queues = true
>
> Cheers, Massimo
>
>
> On Sat, Aug 8, 2020 at 6:34 AM Fabian Zimmermann <dev.faz at gmail.com>
> wrote:
>
>> Hi,
>>
>> we also have this issue.
>>
>> Our solution was (up to now) to delete the queues with a script or even
>> reset the complete cluster.
>>
>> We just upgraded rabbitmq to the latest version - without luck.
>>
>> Anyone else seeing this issue?
>>
>>  Fabian
>>
>>
>>
>> Arnaud Morin <arnaud.morin at gmail.com> schrieb am Do., 6. Aug. 2020,
>> 16:47:
>>
>>> Hey all,
>>>
>>> I would like to ask the community about a rabbit issue we have from time
>>> to time.
>>>
>>> In our current architecture, we have a cluster of rabbits (3 nodes) for
>>> all our OpenStack services (mostly nova and neutron).
>>>
>>> When one node of this cluster is down, the cluster continue working (we
>>> use pause_minority strategy).
>>> But, sometimes, the third server is not able to recover automatically
>>> and need a manual intervention.
>>> After this intervention, we restart the rabbitmq-server process, which
>>> is then able to join the cluster back.
>>>
>>> At this time, the cluster looks ok, everything is fine.
>>> BUT, nothing works.
>>> Neutron and nova agents are not able to report back to servers.
>>> They appear dead.
>>> Servers seems not being able to consume messages.
>>> The exchanges, queues, bindings seems good in rabbit.
>>>
>>> What we see is that removing bindings (using rabbitmqadmin delete
>>> binding or the web interface) and recreate them again (using the same
>>> routing key) brings the service back up and running.
>>>
>>> Doing this for all queues is really painful. Our next plan is to
>>> automate it, but is there anyone in the community already saw this kind
>>> of issues?
>>>
>>> Our bug looks like the one described in [1].
>>> Someone recommands to create an Alternate Exchange.
>>> Is there anyone already tried that?
>>>
>>> FYI, we are running rabbit 3.8.2 (with OpenStack Stein).
>>> We had the same kind of issues using older version of rabbit.
>>>
>>> Thanks for your help.
>>>
>>> [1]
>>> https://groups.google.com/forum/#!newtopic/rabbitmq-users/rabbitmq-users/zFhmpHF2aWk
>>>
>>> --
>>> Arnaud Morin
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200808/b7f9b0ca/attachment.html>


More information about the openstack-discuss mailing list