[Openstack] [Openstack-operators] Nova Controller HA issues

Igor Laskovy igor.laskovy at gmail.com
Fri Jun 15 13:39:19 UTC 2012


So, making cloud controller node highly available using Corosync/Pacemaker
does not really avoid potential issue which you have mentioned below
(recover from half completed tasks), correct?

Igor Laskovy
facebook.com/igor.laskovy
Kiev, Ukraine
On Jun 15, 2012 1:36 PM, "John Garbutt" <John.Garbutt at citrix.com> wrote:

> I know there is some work in the XenAPI driver to make it resilient to
> these kinds of failures (to allow frequent updates of the nova code), and I
> think there were plans for the work to be reused in the Libvirt driver.***
> *
>
> ** **
>
> AFAIK, in Essex and lower, bad things can happen if you don’t wait for all
> the tasks to finish. You may well be OK some of the time.****
>
> ** **
>
> It boils down to an issue of consuming the message from Rabbit but not
> completing the task, and not being able to recover from half completed
> tasks.****
>
> ** **
>
> Hope that helps,****
>
> John****
>
> ** **
>
> *From:* Igor Laskovy [mailto:igor.laskovy at gmail.com]
> *Sent:* 15 June 2012 11:31
> *To:* Christian Parpart
> *Cc:* John Garbutt; openstack-operators at lists.openstack.org; &lt,
> openstack at lists.launchpad.net&gt,
> *Subject:* Re: [Openstack-operators] Nova Controller HA issues****
>
> ** **
>
> I am using OpenStack for my little lab for a short time too))****
>
> Ok, you are right of course, but I meant a some another design when told
> about virtualization controller nodes.****
>
> It is can be only two dedicated hypetvisor with dedicated share/drbd
> between them. This hypervisors will be standalone, and not be part of nova.
> Than, maybe pacemaker or another tool can take availability function to
> restart VM to alive node when active will die.****
>
> Main question here - how worth can be if occurs controller nodes
> unexpected power off. In another word, when VM restart it will be in crash
> consisted state.
> Will some nova services will loose here?
> Will RabbiMQ loose some data here? (I am new to RabbitMQ too)****
>
> Igor Laskovy
> facebook.com/igor.laskovy
> Kiev, Ukraine****
>
> On Jun 15, 2012 10:54 AM, "Christian Parpart" <trapni at gmail.com> wrote:***
> *
>
> Hey,****
>
> ** **
>
> well, I said "I might be wrong" because I have no "clear" vision on how
> OpenStack works in****
>
> its deepest detail, however, I would not like to depend on a controller
> node that****
>
> is inside a virtual machine, controlled by compute nodes, that are
> controlled by the controller****
>
> node. This sounds quite like a chicken-and-egg problem.****
>
> ** **
>
> However, at the time of this writing, I think you'll have to have a
> working nova-scheduler process,****
>
> which is responsible on deciding on which compute node to spawn your VM
> (what else?),****
>
> and think about what you do when this (or all your controller-)VMs
> terribly die,****
>
> and you want to rebuild it, how do you plan to do this when your
> controller node is out-of-service?****
>
> ** **
>
> I in my case have put the controller services onto two compute nodes, and
> use Pacemaker****
>
> to switch between them, in case one node goes down, the other can take
> over (via shared service-IP).****
>
> ** **
>
> Again, these are my thoughts, and I am using OpenStack for just about a
> month now :-)****
>
> But I hope this helps a bit...****
>
> ** **
>
> Best regards,****
>
> Christian Parpart.****
>
> ** **
>
> On Fri, Jun 15, 2012 at 8:16 AM, Igor Laskovy <igor.laskovy at gmail.com>
> wrote:****
>
> Why? Can you please clarify.****
>
> Igor Laskovy
> facebook.com/igor.laskovy
> Kiev, Ukraine****
>
> On Jun 15, 2012 1:55 AM, "Christian Parpart" <trapni at gmail.com> wrote:****
>
> I don't think putting the controller node completely into a VM is a good
> advice,****
>
> at least when speaking of nova-scheduler and nova-api (if central).****
>
> ** **
>
> I may be wrong, and if so, please correct me.
>
> Christian.****
>
> ** **
>
> On Thu, Jun 14, 2012 at 7:20 PM, Igor Laskovy <igor.laskovy at gmail.com>
> wrote:****
>
> Hi, have any updates there?
> Can anybody clarify what happens if controller nodes just going hard
> shutdown?
>
> I thinking about solution with two hypervisors and putting controller
> node in VM shared storage, which can be relaunched when active
> hypervisor will die.
> Any ideas, advise?****
>
>
>
> On Tue, Jun 12, 2012 at 3:52 PM, John Garbutt <John.Garbutt at citrix.com>
> wrote:
> > Sure, I get your point.
> >
> > I think Florian is working on some docs to help on that.
> >
> > Not sure how much has been done already.
> >
> >
> >
> > Cheers,
> >
> > John
> >
> >
> >
> > From: Christian Parpart [mailto:trapni at gmail.com]
> > Sent: 12 June 2012 13:47
> > To: John Garbutt
> > Cc: openstack-operators at lists.openstack.org
> > Subject: Re: [Openstack-operators] Nova Controller HA issues
> >
> >
> >
> > Hey, ya I also found this page, but didn't find it yet that helpful, it
> > rather much sounds like a theoretical paper on
> >
> > how they implemented it rather then telling me on how to actually make it
> > happen (from the sysop point of view :-)
> >
> >
> >
> > I hoped that someone had to face this already, since I really find it
> very
> > unintuitive to realize, or need to wait until
> >
> > I get more time to investigate dedicated. :-)
> >
> >
> >
> > Regards,
> >
> > Christian.
> >
> > On Tue, Jun 12, 2012 at 12:52 PM, John Garbutt <John.Garbutt at citrix.com>
> > wrote:
> >
> > I thought Rabbit had a built in HA solution these days:
> >
> > http://www.rabbitmq.com/ha.html
> >
> >
> >
> > From: openstack-operators-bounces at lists.openstack.org
> > [mailto:openstack-operators-bounces at lists.openstack.org] On Behalf Of
> > Christian Parpart
> > Sent: 12 June 2012 09:59
> > To: openstack-operators at lists.openstack.org
> > Subject: [Openstack-operators] Nova Controller HA issues
> >
> >
> >
> > Hi all,
> >
> >
> >
> > after spending the whole evening in making our cloud controller node
> highly
> > available
> >
> > using Corosync/Pacemaker, at which I am really proud about it, I am
> having
> > just a few
> >
> > problems left, and the one that freaks me out the most is
> rabbitmq-server.
> >
> >
> >
> > That beast I just seem to find no good documenation on how to set
> > rabbitmq-server up
> >
> > properly for HA'ing.
> >
> >
> >
> > Does anyone have ever tried to set a nova controller (including rabbitmq
> > dependency) up for HAing?
> >
> > If so, I'd be pleased to share experiences, especially to the latter
> part.
> > :-)
> >
> >
> >
> > Best regards,
> >
> > Christian Parpart
> >
> >
> >
> >****
>
> > _______________________________________________
> > Openstack-operators mailing list
> > Openstack-operators at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> >
>
>
>
> --
> Igor Laskovy
> Kiev, Ukraine****
>
> ** **
>
> ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20120615/04a93662/attachment.html>


More information about the Openstack mailing list