[Openstack] [Openstack-operators] Nova Controller HA issues

Igor Laskovy igor.laskovy at gmail.com
Sun Jun 17 09:59:01 UTC 2012


*John, Jason*, can you please concretely clarify what this bad things? For
example the worst-case.

*Yoshi, Kei*, can you please clarify current status of Kemari. How far it
is from production usage?

On Fri, Jun 15, 2012 at 5:48 PM, Jason Hedden <jhedden at mcs.anl.gov> wrote:
> I'm running 2 full nova controllers behind a NGINX load balancer.  While
there still is that chance of half completed tasks, it's been working very
well.
>
> Each nova controller is running (full time) nova-scheduler, nova-cert,
keystone, and 6 nova-api processes. All API requests go through NGINX which
reverse proxies the traffic to these 2 systems.
>
> example Nginx nova-api config:
> upstream nova-api  {
>  server hostA:8774 fail_timeout=30s;
>  server hostB:8774 fail_timeout=30s;
>  server hostA:18774 fail_timeout=30s;
>  server hostB:18774 fail_timeout=30s;
>  server hostA:28774 fail_timeout=30s;
>  server hostB:28774 fail_timeout=30s;
>  server hostA:38774 fail_timeout=30s;
>  server hostB:38774 fail_timeout=30s;
>  server hostA:48774 fail_timeout=30s;
>  server hostB:48774 fail_timeout=30s;
>  server hostA:58774 fail_timeout=30s;
>  server hostB:58774 fail_timeout=30s;
> }
>
> server {
>  listen x.x.x.x:8774;
>  server_name public.name;
>
>  location / {
>    proxy_pass  http://nova-api;
>    proxy_set_header        Host            "public.address:8774";
>    proxy_set_header        X-Real-IP       $remote_addr;
>    proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
>  }
> }
>
>
> Attached is a diagram that gives a brief overview of the HA environment
I've setup.
>
> --Jason Hedden
>
>
> On Jun 15, 2012, at 5:36 AM, John Garbutt wrote:
>
>> I know there is some work in the XenAPI driver to make it resilient to
these kinds of failures (to allow frequent updates of the nova code), and I
think there were plans for the work to be reused in the Libvirt driver.
>>
>> AFAIK, in Essex and lower, bad things can happen if you don’t wait for
all the tasks to finish. You may well be OK some of the time.
>>
>> It boils down to an issue of consuming the message from Rabbit but not
completing the task, and not being able to recover from half completed
tasks.
>>
>> Hope that helps,
>> John
>>
>> From: Igor Laskovy [mailto:igor.laskovy at gmail.com]
>> Sent: 15 June 2012 11:31
>> To: Christian Parpart
>> Cc: John Garbutt; openstack-operators at lists.openstack.org; &lt,
openstack at lists.launchpad.net&gt,
>> Subject: Re: [Openstack-operators] Nova Controller HA issues
>>
>> I am using OpenStack for my little lab for a short time too))
>>
>> Ok, you are right of course, but I meant a some another design when told
about virtualization controller nodes.
>>
>> It is can be only two dedicated hypetvisor with dedicated share/drbd
between them. This hypervisors will be standalone, and not be part of nova.
Than, maybe pacemaker or another tool can take availability function to
restart VM to alive node when active will die.
>>
>> Main question here - how worth can be if occurs controller nodes
unexpected power off. In another word, when VM restart it will be in crash
consisted state.
>> Will some nova services will loose here?
>> Will RabbiMQ loose some data here? (I am new to RabbitMQ too)
>>
>> Igor Laskovy
>> facebook.com/igor.laskovy
>> Kiev, Ukraine
>>
>> On Jun 15, 2012 10:54 AM, "Christian Parpart" <trapni at gmail.com> wrote:
>> Hey,
>>
>> well, I said "I might be wrong" because I have no "clear" vision on how
OpenStack works in
>> its deepest detail, however, I would not like to depend on a controller
node that
>> is inside a virtual machine, controlled by compute nodes, that are
controlled by the controller
>> node. This sounds quite like a chicken-and-egg problem.
>>
>> However, at the time of this writing, I think you'll have to have a
working nova-scheduler process,
>> which is responsible on deciding on which compute node to spawn your VM
(what else?),
>> and think about what you do when this (or all your controller-)VMs
terribly die,
>> and you want to rebuild it, how do you plan to do this when your
controller node is out-of-service?
>>
>> I in my case have put the controller services onto two compute nodes,
and use Pacemaker
>> to switch between them, in case one node goes down, the other can take
over (via shared service-IP).
>>
>> Again, these are my thoughts, and I am using OpenStack for just about a
month now :-)
>> But I hope this helps a bit...
>>
>> Best regards,
>> Christian Parpart.
>>
>> On Fri, Jun 15, 2012 at 8:16 AM, Igor Laskovy <igor.laskovy at gmail.com>
wrote:
>> Why? Can you please clarify.
>>
>> Igor Laskovy
>> facebook.com/igor.laskovy
>> Kiev, Ukraine
>>
>> On Jun 15, 2012 1:55 AM, "Christian Parpart" <trapni at gmail.com> wrote:
>> I don't think putting the controller node completely into a VM is a good
advice,
>> at least when speaking of nova-scheduler and nova-api (if central).
>>
>> I may be wrong, and if so, please correct me.
>>
>> Christian.
>>
>> On Thu, Jun 14, 2012 at 7:20 PM, Igor Laskovy <igor.laskovy at gmail.com>
wrote:
>> Hi, have any updates there?
>> Can anybody clarify what happens if controller nodes just going hard
shutdown?
>>
>> I thinking about solution with two hypervisors and putting controller
>> node in VM shared storage, which can be relaunched when active
>> hypervisor will die.
>> Any ideas, advise?
>>
>>
>> On Tue, Jun 12, 2012 at 3:52 PM, John Garbutt <John.Garbutt at citrix.com>
wrote:
>> > Sure, I get your point.
>> >
>> > I think Florian is working on some docs to help on that.
>> >
>> > Not sure how much has been done already.
>> >
>> >
>> >
>> > Cheers,
>> >
>> > John
>> >
>> >
>> >
>> > From: Christian Parpart [mailto:trapni at gmail.com]
>> > Sent: 12 June 2012 13:47
>> > To: John Garbutt
>> > Cc: openstack-operators at lists.openstack.org
>> > Subject: Re: [Openstack-operators] Nova Controller HA issues
>> >
>> >
>> >
>> > Hey, ya I also found this page, but didn't find it yet that helpful, it
>> > rather much sounds like a theoretical paper on
>> >
>> > how they implemented it rather then telling me on how to actually make
it
>> > happen (from the sysop point of view :-)
>> >
>> >
>> >
>> > I hoped that someone had to face this already, since I really find it
very
>> > unintuitive to realize, or need to wait until
>> >
>> > I get more time to investigate dedicated. :-)
>> >
>> >
>> >
>> > Regards,
>> >
>> > Christian.
>> >
>> > On Tue, Jun 12, 2012 at 12:52 PM, John Garbutt <John.Garbutt at citrix.com
>
>> > wrote:
>> >
>> > I thought Rabbit had a built in HA solution these days:
>> >
>> > http://www.rabbitmq.com/ha.html
>> >
>> >
>> >
>> > From: openstack-operators-bounces at lists.openstack.org
>> > [mailto:openstack-operators-bounces at lists.openstack.org] On Behalf Of
>> > Christian Parpart
>> > Sent: 12 June 2012 09:59
>> > To: openstack-operators at lists.openstack.org
>> > Subject: [Openstack-operators] Nova Controller HA issues
>> >
>> >
>> >
>> > Hi all,
>> >
>> >
>> >
>> > after spending the whole evening in making our cloud controller node
highly
>> > available
>> >
>> > using Corosync/Pacemaker, at which I am really proud about it, I am
having
>> > just a few
>> >
>> > problems left, and the one that freaks me out the most is
rabbitmq-server.
>> >
>> >
>> >
>> > That beast I just seem to find no good documenation on how to set
>> > rabbitmq-server up
>> >
>> > properly for HA'ing.
>> >
>> >
>> >
>> > Does anyone have ever tried to set a nova controller (including
rabbitmq
>> > dependency) up for HAing?
>> >
>> > If so, I'd be pleased to share experiences, especially to the latter
part.
>> > :-)
>> >
>> >
>> >
>> > Best regards,
>> >
>> > Christian Parpart
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Openstack-operators mailing list
>> > Openstack-operators at lists.openstack.org
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>> >
>>
>>
>>
>> --
>> Igor Laskovy
>> Kiev, Ukraine
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to     : openstack at lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
>
>
> _______________________________________________
> Openstack-operators mailing list
> Openstack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>



-- 
Igor Laskovy
facebook.com/igor.laskovy
Kiev, Ukraine
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20120617/a48c6bd0/attachment.html>


More information about the Openstack mailing list