[nova][neutron][oslo][ops][kolla] rabbit bindings issue

Mohammed Naser mnaser at vexxhost.com
Mon Aug 24 18:54:40 UTC 2020


On Tue, Aug 18, 2020 at 8:11 AM Arnaud Morin <arnaud.morin at gmail.com> wrote:
>
> Hey all,
>
> About the vexxhost strategy to use only one rabbit server and manage HA through
> rabbit.
> Do you plan to do the same for MariaDB/MySQL?

We use a MySQL operator to deploy a good o'l master/slave replication
cluster and point towards the master, for every service, for two
reasons:

1) We always pointed to a master Galera system anyways, multi-master
was overcomplicated for no real advantage
2) The failover time vs the complexity of Galera (and how often we
failover) favours #1
3) We use "orchestrator" by GitHub which manages all the promotions/etc for us

> --
> Arnaud Morin
>
> On 14.08.20 - 18:45, Fabian Zimmermann wrote:
> > Hi,
> >
> > i read somewhere that vexxhosts kubernetes openstack-Operator is running
> > one rabbitmq Container per Service. Just the kubernetes self healing is
> > used as "ha" for rabbitmq.
> >
> > That seems to match with my finding: run rabbitmq standalone and use an
> > external system to restart rabbitmq if required.
> >
> >  Fabian
> >
> > Satish Patel <satish.txt at gmail.com> schrieb am Fr., 14. Aug. 2020, 16:59:
> >
> > > Fabian,
> > >
> > > what do you mean?
> > >
> > > >> I think vexxhost is running (1) with their openstack-operator - for
> > > reasons.
> > >
> > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann <dev.faz at gmail.com>
> > > wrote:
> > > >
> > > > Hello again,
> > > >
> > > > just a short update about the results of my tests.
> > > >
> > > > I currently see 2 ways of running openstack+rabbitmq
> > > >
> > > > 1. without durable-queues and without replication - just one
> > > rabbitmq-process which gets (somehow) restarted if it fails.
> > > > 2. durable-queues and replication
> > > >
> > > > Any other combination of these settings leads to more or less issues with
> > > >
> > > > * broken / non working bindings
> > > > * broken queues
> > > >
> > > > I think vexxhost is running (1) with their openstack-operator - for
> > > reasons.
> > > >
> > > > I added [kolla], because kolla-ansible is installing rabbitmq with
> > > replication but without durable-queues.
> > > >
> > > > May someone point me to the best way to document these findings to some
> > > official doc?
> > > > I think a lot of installations out there will run into issues if - under
> > > load - a node fails.
> > > >
> > > >  Fabian
> > > >
> > > >
> > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann <
> > > dev.faz at gmail.com>:
> > > >>
> > > >> Hi,
> > > >>
> > > >> just did some short tests today in our test-environment (without
> > > durable queues and without replication):
> > > >>
> > > >> * started a rally task to generate some load
> > > >> * kill-9-ed rabbitmq on one node
> > > >> * rally task immediately stopped and the cloud (mostly) stopped working
> > > >>
> > > >> after some debugging i found (again) exchanges which had bindings to
> > > queues, but these bindings didnt forward any msgs.
> > > >> Wrote a small script to detect these broken bindings and will now check
> > > if this is "reproducible"
> > > >>
> > > >> then I will try "durable queues" and "durable queues with replication"
> > > to see if this helps. Even if I would expect
> > > >> rabbitmq should be able to handle this without these "hidden broken
> > > bindings"
> > > >>
> > > >> This just FYI.
> > > >>
> > > >>  Fabian
> > >
>


-- 
Mohammed Naser
VEXXHOST, Inc.



More information about the openstack-discuss mailing list