Re: [nova][neutron][oslo][ops][kolla] rabbit bindings issue

15 Aug 2020

      Hi,

Already looked in Oslo.messaging, but rabbitmq is the only stable driver :(

Kafka is marked as experimental and (if the docs are correct) is only
usable for notifications.

Would love to switch to an alternate.

 Fabian

Satish Patel <satish.txt@gmail.com> schrieb am So., 16. Aug. 2020, 02:13:
...
Hi Sean,
Sounds good, but running rabbitmq for each service going to be little
overhead also, how do you scale cluster (Yes we can use cellv2 but its
not something everyone like to do because of complexity). If we thinks
rabbitMQ is growing pain then why community not looking for
alternative option (kafka) etc..?
On Fri, Aug 14, 2020 at 3:09 PM Sean Mooney <smooney@redhat.com> wrote:
...
On Fri, 2020-08-14 at 18:45 +0200, Fabian Zimmermann wrote:
...
Hi,
i read somewhere that vexxhosts kubernetes openstack-Operator is
running
...
...
one rabbitmq Container per Service. Just the kubernetes self healing is
used as "ha" for rabbitmq.
That seems to match with my finding: run rabbitmq standalone and use an
external system to restart rabbitmq if required.
thats the design that was orginally planned for kolla-kubernetes
orrignally
each service was to be deployed with its own rabbit mq server if it
required one
and if it crashed it woudl just be recreated by k8s. it perfromace
better then a cluster
and if you trust k8s or the external service enough to ensure it is
recteated it
should be as effective a solution. you dont even need k8s to do that but
it seams to be
a good fit if  your prepared to ocationally loose inflight rpcs.
if you not then you can configure rabbit to persite all message to disk
and mont that on a shared
file system like nfs or cephfs so that when the rabbit instance is
recreated the queue contency is
perserved. assuming you can take the perfromance hit of writing all
messages to disk that is.
...
Fabian
Satish Patel <satish.txt@gmail.com> schrieb am Fr., 14. Aug. 2020,
16:59:
...
...
Fabian,
what do you mean?
...
...
I think vexxhost is running (1) with their openstack-operator -
for
...
...
reasons.
On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann <dev.faz@gmail.com
...
...
wrote:
...
Hello again,
just a short update about the results of my tests.
I currently see 2 ways of running openstack+rabbitmq
1. without durable-queues and without replication - just one
rabbitmq-process which gets (somehow) restarted if it fails.
...
2. durable-queues and replication
Any other combination of these settings leads to more or less
issues with
* broken / non working bindings
* broken queues
I think vexxhost is running (1) with their openstack-operator - for
reasons.
...
I added [kolla], because kolla-ansible is installing rabbitmq with
replication but without durable-queues.
...
May someone point me to the best way to document these findings to
some
official doc?
...
I think a lot of installations out there will run into issues if -
under
load - a node fails.
...
Fabian
Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann <
dev.faz@gmail.com>:
...
...
Hi,
just did some short tests today in our test-environment (without
durable queues and without replication):
...
...
* started a rally task to generate some load
* kill-9-ed rabbitmq on one node
* rally task immediately stopped and the cloud (mostly) stopped
working
...
after some debugging i found (again) exchanges which had
bindings to
queues, but these bindings didnt forward any msgs.
...
...
Wrote a small script to detect these broken bindings and will
now check
if this is "reproducible"
...
...
then I will try "durable queues" and "durable queues with
replication"
to see if this helps. Even if I would expect
...
...
rabbitmq should be able to handle this without these "hidden
broken
bindings"
...
...
This just FYI.
Fabian