[nova][neutron][oslo][ops] rabbit bindings issue

Arnaud Morin arnaud.morin at gmail.com
Tue Aug 11 10:28:43 UTC 2020


Thanks for those tips, I will check both values asap.

About the complete reset of the cluster, this is also what we use to do,
but this has some downside, such as the need to restart all agents, services,
etc)

Cheers,

-- 
Arnaud Morin

On 08.08.20 - 15:06, Fabian Zimmermann wrote:
> Hi,
> 
> dont know if durable queues help, but should be enabled by rabbitmq policy
> which (alone) doesnt seem to fix this (we have this active)
> 
>  Fabian
> 
> Massimo Sgaravatto <massimo.sgaravatto at gmail.com> schrieb am Sa., 8. Aug.
> 2020, 09:36:
> 
> > We also see the issue.  When it happens stopping and restarting the rabbit
> > cluster usually helps.
> >
> > I thought the problem was because of a wrong setting in the openstack
> > services conf files: I missed these settings (that I am now going to add):
> >
> > [oslo_messaging_rabbit]
> > rabbit_ha_queues = true
> > amqp_durable_queues = true
> >
> > Cheers, Massimo
> >
> >
> > On Sat, Aug 8, 2020 at 6:34 AM Fabian Zimmermann <dev.faz at gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> we also have this issue.
> >>
> >> Our solution was (up to now) to delete the queues with a script or even
> >> reset the complete cluster.
> >>
> >> We just upgraded rabbitmq to the latest version - without luck.
> >>
> >> Anyone else seeing this issue?
> >>
> >>  Fabian
> >>
> >>
> >>
> >> Arnaud Morin <arnaud.morin at gmail.com> schrieb am Do., 6. Aug. 2020,
> >> 16:47:
> >>
> >>> Hey all,
> >>>
> >>> I would like to ask the community about a rabbit issue we have from time
> >>> to time.
> >>>
> >>> In our current architecture, we have a cluster of rabbits (3 nodes) for
> >>> all our OpenStack services (mostly nova and neutron).
> >>>
> >>> When one node of this cluster is down, the cluster continue working (we
> >>> use pause_minority strategy).
> >>> But, sometimes, the third server is not able to recover automatically
> >>> and need a manual intervention.
> >>> After this intervention, we restart the rabbitmq-server process, which
> >>> is then able to join the cluster back.
> >>>
> >>> At this time, the cluster looks ok, everything is fine.
> >>> BUT, nothing works.
> >>> Neutron and nova agents are not able to report back to servers.
> >>> They appear dead.
> >>> Servers seems not being able to consume messages.
> >>> The exchanges, queues, bindings seems good in rabbit.
> >>>
> >>> What we see is that removing bindings (using rabbitmqadmin delete
> >>> binding or the web interface) and recreate them again (using the same
> >>> routing key) brings the service back up and running.
> >>>
> >>> Doing this for all queues is really painful. Our next plan is to
> >>> automate it, but is there anyone in the community already saw this kind
> >>> of issues?
> >>>
> >>> Our bug looks like the one described in [1].
> >>> Someone recommands to create an Alternate Exchange.
> >>> Is there anyone already tried that?
> >>>
> >>> FYI, we are running rabbit 3.8.2 (with OpenStack Stein).
> >>> We had the same kind of issues using older version of rabbit.
> >>>
> >>> Thanks for your help.
> >>>
> >>> [1]
> >>> https://groups.google.com/forum/#!newtopic/rabbitmq-users/rabbitmq-users/zFhmpHF2aWk
> >>>
> >>> --
> >>> Arnaud Morin
> >>>
> >>>
> >>>



More information about the openstack-discuss mailing list