[nova][neutron][oslo][ops][kolla] rabbit bindings issue

Fabian Zimmermann dev.faz at gmail.com
Fri Aug 21 07:06:24 UTC 2020


Hi,

don't understand what you mean with "alternate exchange"? I'm doing
all my tests on my DEV-Env? It's a completely separated / dedicated
(virtual) cluster.

I just enabled the feature and wrote a small script to read the
metrics from the api.

I'm having some "dropped msg" in my cluster, just trying to figure out
if they are "normal".

 Fabian

Am Do., 20. Aug. 2020 um 21:28 Uhr schrieb Arnaud MORIN
<arnaud.morin at gmail.com>:
>
> Hello,
> Are you doing that using alternate exchange ?
> I started configuring it in our env but not yet finished.
>
> Cheers,
>
> Le jeu. 20 août 2020 à 19:16, Fabian Zimmermann <dev.faz at gmail.com> a écrit :
>>
>> Hi,
>>
>> just another idea:
>>
>> Rabbitmq is able to count undelivered messages. We could use this information to detect the broken bindings (causing undeliverable messages).
>>
>> Anyone already doing this?
>>
>> I currently don't have a way to reproduce the broken bindings, so I'm unable to proof the idea.
>>
>> Seems we have to wait issue to happen again - what - hopefully - never happens :)
>>
>>  Fabian
>>
>> Arnaud Morin <arnaud.morin at gmail.com> schrieb am Di., 18. Aug. 2020, 14:07:
>>>
>>> Hey all,
>>>
>>> About the vexxhost strategy to use only one rabbit server and manage HA through
>>> rabbit.
>>> Do you plan to do the same for MariaDB/MySQL?
>>>
>>> --
>>> Arnaud Morin
>>>
>>> On 14.08.20 - 18:45, Fabian Zimmermann wrote:
>>> > Hi,
>>> >
>>> > i read somewhere that vexxhosts kubernetes openstack-Operator is running
>>> > one rabbitmq Container per Service. Just the kubernetes self healing is
>>> > used as "ha" for rabbitmq.
>>> >
>>> > That seems to match with my finding: run rabbitmq standalone and use an
>>> > external system to restart rabbitmq if required.
>>> >
>>> >  Fabian
>>> >
>>> > Satish Patel <satish.txt at gmail.com> schrieb am Fr., 14. Aug. 2020, 16:59:
>>> >
>>> > > Fabian,
>>> > >
>>> > > what do you mean?
>>> > >
>>> > > >> I think vexxhost is running (1) with their openstack-operator - for
>>> > > reasons.
>>> > >
>>> > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann <dev.faz at gmail.com>
>>> > > wrote:
>>> > > >
>>> > > > Hello again,
>>> > > >
>>> > > > just a short update about the results of my tests.
>>> > > >
>>> > > > I currently see 2 ways of running openstack+rabbitmq
>>> > > >
>>> > > > 1. without durable-queues and without replication - just one
>>> > > rabbitmq-process which gets (somehow) restarted if it fails.
>>> > > > 2. durable-queues and replication
>>> > > >
>>> > > > Any other combination of these settings leads to more or less issues with
>>> > > >
>>> > > > * broken / non working bindings
>>> > > > * broken queues
>>> > > >
>>> > > > I think vexxhost is running (1) with their openstack-operator - for
>>> > > reasons.
>>> > > >
>>> > > > I added [kolla], because kolla-ansible is installing rabbitmq with
>>> > > replication but without durable-queues.
>>> > > >
>>> > > > May someone point me to the best way to document these findings to some
>>> > > official doc?
>>> > > > I think a lot of installations out there will run into issues if - under
>>> > > load - a node fails.
>>> > > >
>>> > > >  Fabian
>>> > > >
>>> > > >
>>> > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann <
>>> > > dev.faz at gmail.com>:
>>> > > >>
>>> > > >> Hi,
>>> > > >>
>>> > > >> just did some short tests today in our test-environment (without
>>> > > durable queues and without replication):
>>> > > >>
>>> > > >> * started a rally task to generate some load
>>> > > >> * kill-9-ed rabbitmq on one node
>>> > > >> * rally task immediately stopped and the cloud (mostly) stopped working
>>> > > >>
>>> > > >> after some debugging i found (again) exchanges which had bindings to
>>> > > queues, but these bindings didnt forward any msgs.
>>> > > >> Wrote a small script to detect these broken bindings and will now check
>>> > > if this is "reproducible"
>>> > > >>
>>> > > >> then I will try "durable queues" and "durable queues with replication"
>>> > > to see if this helps. Even if I would expect
>>> > > >> rabbitmq should be able to handle this without these "hidden broken
>>> > > bindings"
>>> > > >>
>>> > > >> This just FYI.
>>> > > >>
>>> > > >>  Fabian
>>> > >



More information about the openstack-discuss mailing list