[nova][neutron][oslo][ops][kolla] rabbit bindings issue

Arnaud Morin arnaud.morin at gmail.com
Mon Aug 17 14:17:37 UTC 2020


Hey Fabian,

I was thinking the same, and I found the "default" values from
openstack-ansible:
https://github.com/openstack/openstack-ansible-rabbitmq_server/blob/fc27e735a68b64cb3c67dd8abeaf324803a9845b/defaults/main.yml#L172

pattern: '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'

Which are setting HA for all except
amq.*
*_fanout_*
reply_*

So that would make sense?

-- 
Arnaud Morin

On 17.08.20 - 16:03, Fabian Zimmermann wrote:
> Just to keep the list updated.
> 
> If you run with durable_queues and replication, there is still a
> possibility, that a short living queue will *not* jet be replicated
> and a node failure will mark these queue as "unreachable". This
> wouldnt be a problem, if openstack would create a new queue, but i
> fear it would just try to reuse the existing after reconnect.
> 
> So, after all - it seems the less buggy way would be
> 
> * use durable-queue and replication for long-running queues/exchanges
> * use non-durable-queue without replication for short (fanout, reply_) queues
> 
> This should allow the short-living ones to destroy themself on node
> failure, and the long living ones should be able to be as available as
> possible.
> 
> Absolutely untested - so use with caution, but here is a possible
> policy-regex: ^(?!amq\.)(?!reply_)(?!.*fanout).*
> 
>  Fabian
> 
> 
> Am So., 16. Aug. 2020 um 15:37 Uhr schrieb Sean Mooney <smooney at redhat.com>:
> >
> > On Sat, 2020-08-15 at 20:13 -0400, Satish Patel wrote:
> > > Hi Sean,
> > >
> > > Sounds good, but running rabbitmq for each service going to be little
> > > overhead also, how do you scale cluster (Yes we can use cellv2 but its
> > > not something everyone like to do because of complexity).
> >
> > my understanding is that when using rabbitmq adding multiple rabbitmq servers in a cluster lowers
> > througput vs jsut 1 rabbitmq instance for any given excahnge. that is because the content of
> > the queue need to be syconised across the cluster. so if cinder nova and neutron share
> > a 3 node cluster and your compaure that to the same service deployed with cinder nova and neuton
> > each having there on rabbitmq service then the independent deployment will tend to out perform the
> > clustered solution. im not really sure if that has change i know tha thow clustering has been donw has evovled
> > over the years but in the past clustering was the adversary of scaling.
> >
> > >  If we thinks
> > > rabbitMQ is growing pain then why community not looking for
> > > alternative option (kafka) etc..?
> > we have looked at alternivives several times
> > rabbit mq  wroks well enough ans scales well enough for most deployments.
> > there other amqp implimantation that scale better then rabbit,
> > activemq and qpid are both reported to scale better but they perfrom worse
> > out of the box and need to be carfully tuned
> >
> > in the past zeromq has been supported but peole did not maintain it.
> >
> > kafka i dont think is a good alternative but nats https://nats.io/ might be.
> >
> > for what its worth all nova deployment are cellv2 deployments with 1 cell from around pike/rocky
> > and its really not that complex. cells_v1 was much more complex bug part of the redesign
> > for cells_v2 was makeing sure there is only 1 code path. adding a second cell just need another
> > cell db and conductor to be deployed assuming you startted with a super conductor in the first
> > place. the issue is cells is only a nova feature no other service have cells so it does not help
> > you with cinder or neutron. as such cinder an neutron likely be the services that hit scaling limits first.
> > adopign cells in other services is not nessaryally the right approch either but when we talk about scale
> > we do need to keep in mind that cells is just for nova today.
> >
> >
> > >
> > > On Fri, Aug 14, 2020 at 3:09 PM Sean Mooney <smooney at redhat.com> wrote:
> > > >
> > > > On Fri, 2020-08-14 at 18:45 +0200, Fabian Zimmermann wrote:
> > > > > Hi,
> > > > >
> > > > > i read somewhere that vexxhosts kubernetes openstack-Operator is running
> > > > > one rabbitmq Container per Service. Just the kubernetes self healing is
> > > > > used as "ha" for rabbitmq.
> > > > >
> > > > > That seems to match with my finding: run rabbitmq standalone and use an
> > > > > external system to restart rabbitmq if required.
> > > >
> > > > thats the design that was orginally planned for kolla-kubernetes orrignally
> > > >
> > > > each service was to be deployed with its own rabbit mq server if it required one
> > > > and if it crashed it woudl just be recreated by k8s. it perfromace better then a cluster
> > > > and if you trust k8s or the external service enough to ensure it is recteated it
> > > > should be as effective a solution. you dont even need k8s to do that but it seams to be
> > > > a good fit if  your prepared to ocationally loose inflight rpcs.
> > > > if you not then you can configure rabbit to persite all message to disk and mont that on a shared
> > > > file system like nfs or cephfs so that when the rabbit instance is recreated the queue contency is
> > > > perserved. assuming you can take the perfromance hit of writing all messages to disk that is.
> > > > >
> > > > >  Fabian
> > > > >
> > > > > Satish Patel <satish.txt at gmail.com> schrieb am Fr., 14. Aug. 2020, 16:59:
> > > > >
> > > > > > Fabian,
> > > > > >
> > > > > > what do you mean?
> > > > > >
> > > > > > > > I think vexxhost is running (1) with their openstack-operator - for
> > > > > >
> > > > > > reasons.
> > > > > >
> > > > > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann <dev.faz at gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hello again,
> > > > > > >
> > > > > > > just a short update about the results of my tests.
> > > > > > >
> > > > > > > I currently see 2 ways of running openstack+rabbitmq
> > > > > > >
> > > > > > > 1. without durable-queues and without replication - just one
> > > > > >
> > > > > > rabbitmq-process which gets (somehow) restarted if it fails.
> > > > > > > 2. durable-queues and replication
> > > > > > >
> > > > > > > Any other combination of these settings leads to more or less issues with
> > > > > > >
> > > > > > > * broken / non working bindings
> > > > > > > * broken queues
> > > > > > >
> > > > > > > I think vexxhost is running (1) with their openstack-operator - for
> > > > > >
> > > > > > reasons.
> > > > > > >
> > > > > > > I added [kolla], because kolla-ansible is installing rabbitmq with
> > > > > >
> > > > > > replication but without durable-queues.
> > > > > > >
> > > > > > > May someone point me to the best way to document these findings to some
> > > > > >
> > > > > > official doc?
> > > > > > > I think a lot of installations out there will run into issues if - under
> > > > > >
> > > > > > load - a node fails.
> > > > > > >
> > > > > > >  Fabian
> > > > > > >
> > > > > > >
> > > > > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann <
> > > > > >
> > > > > > dev.faz at gmail.com>:
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > just did some short tests today in our test-environment (without
> > > > > >
> > > > > > durable queues and without replication):
> > > > > > > >
> > > > > > > > * started a rally task to generate some load
> > > > > > > > * kill-9-ed rabbitmq on one node
> > > > > > > > * rally task immediately stopped and the cloud (mostly) stopped working
> > > > > > > >
> > > > > > > > after some debugging i found (again) exchanges which had bindings to
> > > > > >
> > > > > > queues, but these bindings didnt forward any msgs.
> > > > > > > > Wrote a small script to detect these broken bindings and will now check
> > > > > >
> > > > > > if this is "reproducible"
> > > > > > > >
> > > > > > > > then I will try "durable queues" and "durable queues with replication"
> > > > > >
> > > > > > to see if this helps. Even if I would expect
> > > > > > > > rabbitmq should be able to handle this without these "hidden broken
> > > > > >
> > > > > > bindings"
> > > > > > > >
> > > > > > > > This just FYI.
> > > > > > > >
> > > > > > > >  Fabian
> > >
> > >
> >
> 



More information about the openstack-discuss mailing list