Open Stack

Tue May 5 02:32:58 UTC 2015

> On 29 Apr 2015, at 5:38 pm, Zhou Zheng Sheng / 周征晟 <zhengsheng at awcloud.com> wrote:

[snip]

> Batch is a pacemaker concept I found when I was reading its
> documentation and code. There is a "batch-limit: 30" in the output of
> "pcs property list --all". The pacemaker official documentation
> explanation is that it's "The number of jobs that the TE is allowed to
> execute in parallel." From my understanding, pacemaker maintains cluster
> states, and when we start/stop/promote/demote a resource, it triggers a
> state transition. Pacemaker puts as many as possible transition jobs
> into a batch, and process them in parallel.

Technically it calculates an ordered graph of actions that need to be performed for a set of related resources.
You can see an example of the kinds of graphs it produces at:

   http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/s-config-testing-changes.html

There is a more complex one which includes promotion and demotion on the next page.

The number of actions that can run at any one time is therefor limited by
- the value of batch-limit (the total number of in-flight actions)
- the number of resources that do not have ordering constraints between them (eg. rsc{1,2,3} in the above example)  

So in the above example, if batch-limit >= 3, the monitor_0 actions will still all execute in parallel.
If batch-limit == 2, one of them will be deferred until the others complete.

Processing of the graph stops the moment any action returns a value that was not expected.
If that happens, we wait for currently in-flight actions to complete, re-calculate a new graph based on the new information and start again.

> 
> The problem is that pacemaker can only promote a resource after it
> detects the resource is started.

First we do a non-recurring monitor (*_monitor_0) to check what state the resource is in.
We can’t assume its off because a) we might have crashed, b) the admin might have accidentally configured it to start at boot or c) the admin may have asked us to re-check everything.

> During a full reassemble, in the first
> transition batch, pacemaker starts all the resources including MySQL and
> RabbitMQ. Pacemaker issues resource agent "start" invocation in parallel
> and reaps the results.
> 
> For a multi-state resource agent like RabbitMQ, pacemaker needs the
> start result reported in the first batch, then transition engine and
> policy engine decide if it has to retry starting or promote, and put
> this new transition job into a new batch.

Also important to know, the order of actions is:

1. any necessary demotions
2. any necessary stops
3. any necessary starts
4. any necessary promotions

Open Stack

[openstack-dev] [Fuel] Speed Up RabbitMQ Recovering

OpenStack

Community

Documentation

Branding & Legal