[nova][neutron][oslo][ops][kolla] rabbit bindings issue

Arnaud MORIN arnaud.morin at gmail.com
Thu Aug 20 19:28:40 UTC 2020


Hello,
Are you doing that using alternate exchange ?
I started configuring it in our env but not yet finished.

Cheers,

Le jeu. 20 août 2020 à 19:16, Fabian Zimmermann <dev.faz at gmail.com> a
écrit :

> Hi,
>
> just another idea:
>
> Rabbitmq is able to count undelivered messages. We could use this
> information to detect the broken bindings (causing undeliverable messages).
>
> Anyone already doing this?
>
> I currently don't have a way to reproduce the broken bindings, so I'm
> unable to proof the idea.
>
> Seems we have to wait issue to happen again - what - hopefully - never
> happens :)
>
>  Fabian
>
> Arnaud Morin <arnaud.morin at gmail.com> schrieb am Di., 18. Aug. 2020,
> 14:07:
>
>> Hey all,
>>
>> About the vexxhost strategy to use only one rabbit server and manage HA
>> through
>> rabbit.
>> Do you plan to do the same for MariaDB/MySQL?
>>
>> --
>> Arnaud Morin
>>
>> On 14.08.20 - 18:45, Fabian Zimmermann wrote:
>> > Hi,
>> >
>> > i read somewhere that vexxhosts kubernetes openstack-Operator is running
>> > one rabbitmq Container per Service. Just the kubernetes self healing is
>> > used as "ha" for rabbitmq.
>> >
>> > That seems to match with my finding: run rabbitmq standalone and use an
>> > external system to restart rabbitmq if required.
>> >
>> >  Fabian
>> >
>> > Satish Patel <satish.txt at gmail.com> schrieb am Fr., 14. Aug. 2020,
>> 16:59:
>> >
>> > > Fabian,
>> > >
>> > > what do you mean?
>> > >
>> > > >> I think vexxhost is running (1) with their openstack-operator - for
>> > > reasons.
>> > >
>> > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann <dev.faz at gmail.com>
>> > > wrote:
>> > > >
>> > > > Hello again,
>> > > >
>> > > > just a short update about the results of my tests.
>> > > >
>> > > > I currently see 2 ways of running openstack+rabbitmq
>> > > >
>> > > > 1. without durable-queues and without replication - just one
>> > > rabbitmq-process which gets (somehow) restarted if it fails.
>> > > > 2. durable-queues and replication
>> > > >
>> > > > Any other combination of these settings leads to more or less
>> issues with
>> > > >
>> > > > * broken / non working bindings
>> > > > * broken queues
>> > > >
>> > > > I think vexxhost is running (1) with their openstack-operator - for
>> > > reasons.
>> > > >
>> > > > I added [kolla], because kolla-ansible is installing rabbitmq with
>> > > replication but without durable-queues.
>> > > >
>> > > > May someone point me to the best way to document these findings to
>> some
>> > > official doc?
>> > > > I think a lot of installations out there will run into issues if -
>> under
>> > > load - a node fails.
>> > > >
>> > > >  Fabian
>> > > >
>> > > >
>> > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann <
>> > > dev.faz at gmail.com>:
>> > > >>
>> > > >> Hi,
>> > > >>
>> > > >> just did some short tests today in our test-environment (without
>> > > durable queues and without replication):
>> > > >>
>> > > >> * started a rally task to generate some load
>> > > >> * kill-9-ed rabbitmq on one node
>> > > >> * rally task immediately stopped and the cloud (mostly) stopped
>> working
>> > > >>
>> > > >> after some debugging i found (again) exchanges which had bindings
>> to
>> > > queues, but these bindings didnt forward any msgs.
>> > > >> Wrote a small script to detect these broken bindings and will now
>> check
>> > > if this is "reproducible"
>> > > >>
>> > > >> then I will try "durable queues" and "durable queues with
>> replication"
>> > > to see if this helps. Even if I would expect
>> > > >> rabbitmq should be able to handle this without these "hidden broken
>> > > bindings"
>> > > >>
>> > > >> This just FYI.
>> > > >>
>> > > >>  Fabian
>> > >
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200820/6104d778/attachment-0001.html>


More information about the openstack-discuss mailing list