[openstack-dev] [Fuel] Speed Up RabbitMQ Recovering

Vladimir Kuklin vkuklin at mirantis.com
Wed May 20 19:27:56 UTC 2015


Actually, we are not skipping 'Started' state - we just consider resource
as started when beam is powered up and rabbitmq start_app/stop_app action
succeeds. Such a node is considered as a good one that can be marked as
'Master' to which the nodes should connect and then all the cluster
join/leave actions are handled using multi-state notification mechanism.

On Wed, May 20, 2015 at 5:05 AM, Andrew Beekhof <abeekhof at redhat.com> wrote:

>
> > On 20 May 2015, at 6:05 am, Andrew Woodward <xarses at gmail.com> wrote:
> >
> >
> >
> > On Thu, May 7, 2015 at 5:01 PM Andrew Beekhof <abeekhof at redhat.com>
> wrote:
> >
> > > On 5 May 2015, at 1:19 pm, Zhou Zheng Sheng / 周征晟 <
> zhengsheng at awcloud.com> wrote:
> > >
> > > Thank you Andrew.
> > >
> > > on 2015/05/05 08:03, Andrew Beekhof wrote:
> > >>> On 28 Apr 2015, at 11:15 pm, Bogdan Dobrelya <bdobrelia at mirantis.com>
> wrote:
> > >>>
> > >>>> Hello,
> > >>> Hello, Zhou
> > >>>
> > >>>> I using Fuel 6.0.1 and find that RabbitMQ recover time is long after
> > >>>> power failure. I have a running HA environment, then I reset power
> of
> > >>>> all the machines at the same time. I observe that after reboot it
> > >>>> usually takes 10 minutes for RabittMQ cluster to appear running
> > >>>> master-slave mode in pacemaker. If I power off all the 3
> controllers and
> > >>>> only start 2 of them, the downtime sometimes can be as long as 20
> minutes.
> > >>> Yes, this is a known issue [0]. Note, there were many bugfixes, like
> > >>> [1],[2],[3], merged for MQ OCF script, so you may want to try to
> > >>> backport them as well by the following guide [4]
> > >>>
> > >>> [0] https://bugs.launchpad.net/fuel/+bug/1432603
> > >>> [1] https://review.openstack.org/#/c/175460/
> > >>> [2] https://review.openstack.org/#/c/175457/
> > >>> [3] https://review.openstack.org/#/c/175371/
> > >>> [4] https://review.openstack.org/#/c/170476/
> > >> Is there a reason you’re using a custom OCF script instead of the
> upstream[a] one?
> > >> Please have a chat with David (the maintainer, in CC) if there is
> something you believe is wrong with it.
> > >>
> > >> [a]
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
> > >
> > > I'm using the OCF script from the Fuel project, specifically from the
> > > "6.0" stable branch [alpha].
> >
> > Ah, I’m still learning who is who... i thought you were part of that
> project :-)
> >
> > >
> > > Comparing with upstream OCF code, the main difference is that Fuel
> > > RabbitMQ OCF is a master-slave resource. Fuel RabbitMQ OCF does more
> > > bookkeeping, for example, blocking client access when RabbitMQ cluster
> > > is not ready. I beleive the upstream OCF should be OK to use as well
> > > after I read the code, but it might not fit into the Fuel project. As
> > > far as I test, the Fuel OCF script is good except sometimes the full
> > > reassemble time is long, and as I find out, it is mostly because the
> > > Fuel MySQL Galera OCF script keeps pacemaker from promoting RabbitMQ
> > > resource, as I mentioned in the previous emails.
> > >
> > > Maybe Vladimir and Sergey can give us more insight on why Fuel needs a
> > > master-slave RabbitMQ.
> >
> > That would be good to know.
> > Browsing the agent, promote seems to be a no-op if rabbit is already
> running.
> >
> >
> > To the master / slave reason due to how the ocf script is structured to
> deal with rabbit's poor ability to handle its self in some scenarios.
> Hopefully the state transition diagram [5] is enough to clarify what's
> going on.
> >
> > [5] http://goo.gl/PPNrw7
>
> Not really.
> It seems to be under the impression you can skip started and go directly
> from stopped to master.
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Yours Faithfully,
Vladimir Kuklin,
Fuel Library Tech Lead,
Mirantis, Inc.
+7 (495) 640-49-04
+7 (926) 702-39-68
Skype kuklinvv
35bk3, Vorontsovskaya Str.
Moscow, Russia,
www.mirantis.com <http://www.mirantis.ru/>
www.mirantis.ru
vkuklin at mirantis.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150520/fe5fe674/attachment.html>


More information about the OpenStack-dev mailing list