[openstack-dev] [Fuel] Speed Up RabbitMQ Recovering
Andrew Beekhof
abeekhof at redhat.com
Wed May 20 02:05:01 UTC 2015
> On 20 May 2015, at 6:05 am, Andrew Woodward <xarses at gmail.com> wrote:
>
>
>
> On Thu, May 7, 2015 at 5:01 PM Andrew Beekhof <abeekhof at redhat.com> wrote:
>
> > On 5 May 2015, at 1:19 pm, Zhou Zheng Sheng / 周征晟 <zhengsheng at awcloud.com> wrote:
> >
> > Thank you Andrew.
> >
> > on 2015/05/05 08:03, Andrew Beekhof wrote:
> >>> On 28 Apr 2015, at 11:15 pm, Bogdan Dobrelya <bdobrelia at mirantis.com> wrote:
> >>>
> >>>> Hello,
> >>> Hello, Zhou
> >>>
> >>>> I using Fuel 6.0.1 and find that RabbitMQ recover time is long after
> >>>> power failure. I have a running HA environment, then I reset power of
> >>>> all the machines at the same time. I observe that after reboot it
> >>>> usually takes 10 minutes for RabittMQ cluster to appear running
> >>>> master-slave mode in pacemaker. If I power off all the 3 controllers and
> >>>> only start 2 of them, the downtime sometimes can be as long as 20 minutes.
> >>> Yes, this is a known issue [0]. Note, there were many bugfixes, like
> >>> [1],[2],[3], merged for MQ OCF script, so you may want to try to
> >>> backport them as well by the following guide [4]
> >>>
> >>> [0] https://bugs.launchpad.net/fuel/+bug/1432603
> >>> [1] https://review.openstack.org/#/c/175460/
> >>> [2] https://review.openstack.org/#/c/175457/
> >>> [3] https://review.openstack.org/#/c/175371/
> >>> [4] https://review.openstack.org/#/c/170476/
> >> Is there a reason you’re using a custom OCF script instead of the upstream[a] one?
> >> Please have a chat with David (the maintainer, in CC) if there is something you believe is wrong with it.
> >>
> >> [a] https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
> >
> > I'm using the OCF script from the Fuel project, specifically from the
> > "6.0" stable branch [alpha].
>
> Ah, I’m still learning who is who... i thought you were part of that project :-)
>
> >
> > Comparing with upstream OCF code, the main difference is that Fuel
> > RabbitMQ OCF is a master-slave resource. Fuel RabbitMQ OCF does more
> > bookkeeping, for example, blocking client access when RabbitMQ cluster
> > is not ready. I beleive the upstream OCF should be OK to use as well
> > after I read the code, but it might not fit into the Fuel project. As
> > far as I test, the Fuel OCF script is good except sometimes the full
> > reassemble time is long, and as I find out, it is mostly because the
> > Fuel MySQL Galera OCF script keeps pacemaker from promoting RabbitMQ
> > resource, as I mentioned in the previous emails.
> >
> > Maybe Vladimir and Sergey can give us more insight on why Fuel needs a
> > master-slave RabbitMQ.
>
> That would be good to know.
> Browsing the agent, promote seems to be a no-op if rabbit is already running.
>
>
> To the master / slave reason due to how the ocf script is structured to deal with rabbit's poor ability to handle its self in some scenarios. Hopefully the state transition diagram [5] is enough to clarify what's going on.
>
> [5] http://goo.gl/PPNrw7
Not really.
It seems to be under the impression you can skip started and go directly from stopped to master.
More information about the OpenStack-dev
mailing list