[Openstack-operators] Openstack HA active/passive vs. active/active

John Dewey john at dewey.ws
Wed Nov 27 18:31:46 UTC 2013



On Wednesday, November 27, 2013 at 10:17 AM, Jay Pipes wrote:

> On 11/27/2013 01:02 PM, Alvise Dorigo wrote:
> > Hi Jay, thanks a lot for your rich answer. More comments and questions inline...
> >  
> > On 26 Nov 2013, at 16:51, Jay Pipes <jaypipes at gmail.com (mailto:jaypipes at gmail.com)> wrote:
> >  
> > > On 11/26/2013 07:26 AM, Alvise Dorigo wrote:
> > > > Hello,
> > > > I've read the documentation about Openstack HA
> > > > (http://docs.openstack.org/high-availability-guide/content/index.html)
> > > > and I successfully implemented the active/passive model (with
> > > > corosync/pacemaker) for the two services Keystone and Glance (MySQL HA
> > > > is based on Percona-XtraDB multi-master).
> > > >  
> > > > I'd like to know from the experts, which one is the best (and possibly
> > > > why) model for HA, between active/passive and active/active, basing on
> > > > their usage experience (that is for sure longer than mine).
> > > >  
> > >  
> > >  
> > > There is no reason to run any OpenStack endpoint -- other than the Neutron L3 agent -- in an active/passive way. The reason is because none of the OpenStack endpoints maintain any state. The backend storage systems used by those endpoints *do* contain state -- but the endpoint services themselves do not.
> >  
> > So, in principle I could simply install a cloud controller (with Keystone, Glance, Nova API, Cinder) and just clone it on another machine. Then I could put an HAProxy (made redundant with Keepalived) on top of them. (A different story would be for Neutron L3 agent for which an active/passive mode is preferable, as you pointed out).
> > Does this make sense ?
> >  
>  
>  
> Precisely correct.

Also, Neutron L3 agent isn’t necessarily in active/passive.  You can run multiple L3 agents, and let neutron place the routers where it pleases.  As Jay stated earlier, you can then run a cronjob or service which migrates the agents onto other active hosts upon failure.

https://github.com/stackforge/cookbook-openstack-network/blob/master/files/default/quantum-ha-tool.py
>  
> >  
> > > Simply front each OpenStack endpoint with a DNS name that resolves to a virtual IP managed by a load balancer, ensure that sessions are managed by the load balancer, and you're good.
> > >  
> > > For the Neutron L3 agent, you will need a separate strategy, because unfortunately, the L3 agent is stateful. We use a number of Python scripts to handle failover of routes when an agent fails. You can see these tools here, which we simply add as a cron job:
> > >  
> > > https://github.com/stackforge/cookbook-openstack-network/blob/master/files/default/quantum-ha-tool.py
> > >  
> > > My advice would be to continue using Percona XtraDB for your database backend (we use the same in a variety of ways, from intra-deployment-zone clusters to WAN-replicated clusters). That solves your database availability issues, and nicely, we've found PXC to be as easy or easier to administer and keep in sync than normal MySQL replication.
> >  
> > Definitely. It showed to be as robust as we expected. And, in addition, the combination of Percona+HAProxy makes possible the expansion (substitution) of nodes without any outage period; for example if we need to increase the cluster performances (more CPU, more RAM, more disk)… needless to mention the RR balancing, which comes for free.
>  
> Yep :)
>  
> > > For your message queue, you need to determine a) what level of data loss you are comfortable with, and b) whether to use certain OpenStack projects' ability to retry multiple MQ hosts in the event of a failure (currently Nova, Neutron and Cinder support this but Glance does not, IIRC).
> >  
> > What about having an instance of QPid per node ? As far as I know, qpid also is stateless, isn’ it ? In my active/passive actual cluster I’ve qpid running on both nodes and when I migrate the keystone/glance from on node to the other I do not note anything strange. Do you see any drawback with this ?
>  
> Unfortunately, I have no experience (yet!) with qpid. :(
>  
> Best,
> -jay
>  
> > Thanks,
> >  
> > Alvise
> >  
> > > We use RabbitMQ clustering and have had numerous problems with it, frankly. It's been our pain point from an HA perspective. There are other clustering MQ technologies out there, of course. Frankly, one could write a whole book just about how crappy the MQ clustering "story" is...
> > >  
> > > All the best,
> > > -jay
> > >  
> > >  
> > > _______________________________________________
> > > OpenStack-operators mailing list
> > > OpenStack-operators at lists.openstack.org (mailto:OpenStack-operators at lists.openstack.org)
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> > >  
> >  
> >  
>  
>  
>  
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org (mailto:OpenStack-operators at lists.openstack.org)
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>  
>  


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20131127/dac787fc/attachment.html>


More information about the OpenStack-operators mailing list