Open Stack

Fri May 8 03:13:18 UTC 2015

> On 5 May 2015, at 7:52 pm, Bogdan Dobrelya <bdobrelia at mirantis.com> wrote:
> 
> On 05.05.2015 04:32, Andrew Beekhof wrote:
>> 
>> 
>> [snip]
>> 
>> 
>> Technically it calculates an ordered graph of actions that need to be performed for a set of related resources.
>> You can see an example of the kinds of graphs it produces at:
>> 
>>   http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/s-config-testing-changes.html
>> 
>> There is a more complex one which includes promotion and demotion on the next page.
>> 
>> The number of actions that can run at any one time is therefor limited by
>> - the value of batch-limit (the total number of in-flight actions)
>> - the number of resources that do not have ordering constraints between them (eg. rsc{1,2,3} in the above example)  
>> 
>> So in the above example, if batch-limit >= 3, the monitor_0 actions will still all execute in parallel.
>> If batch-limit == 2, one of them will be deferred until the others complete.
>> 
>> Processing of the graph stops the moment any action returns a value that was not expected.
>> If that happens, we wait for currently in-flight actions to complete, re-calculate a new graph based on the new information and start again.
>> 
>> 
>> First we do a non-recurring monitor (*_monitor_0) to check what state the resource is in.
>> We can’t assume its off because a) we might have crashed, b) the admin might have accidentally configured it to start at boot or c) the admin may have asked us to re-check everything.
>> 
>> 
>> Also important to know, the order of actions is:

I should clarify something here:

   s/actions is/actions for each resource is/

>> 
>> 1. any necessary demotions
>> 2. any necessary stops
>> 3. any necessary starts
>> 4. any necessary promotions
>> 
>> 
> 
> Thank you for explaining this, Andrew!
> 
> So, in the context of the given two example DB(MySQL) and
> messaging(RabbitMQ) resources:
> 
> "The problem is that pacemaker can only promote a resource after it
> detects the resource is started. During a full reassemble, in the first
> transition batch, pacemaker starts all the resources including MySQL and
> RabbitMQ. Pacemaker issues resource agent "start" invocation in parallel
> and reaps the results.
> For a multi-state resource agent like RabbitMQ, pacemaker needs the
> start result reported in the first batch, then transition engine and
> policy engine decide if it has to retry starting or promote, and put
> this new transition job into a new batch."
> 
> So, for given example, it looks like we currently have:
> _batch start_
> ...
> 3. DB, messaging resources start in a one batch

Since there is no dependancy between them, yes.

> 4. messaging resource promote blocked by the step 3 completion
> _batch end_

Not quite, I wasn’t as clear as I could have been in my previous email.

We wont promote Rabbit instances until all they have all been started.
However we don’t need to wait for all the DBs to finish starting (again, because there is no dependancy between them) before we begin promoting Rabbit.

So a single transition that did this is totally possible:

t0.  Begin transition
t1.  Rabbit start node1    (begin)
t2.  DB start node 3       (begin)
t3.  DB start node 2       (begin)
t4.  Rabbit start node2    (begin)
t5.  Rabbit start node3    (begin)
t6.  DB start node 1       (begin)
t7.  Rabbit start node2    (complete)
t8.  Rabbit start node1    (complete)
t9.  DB start node 3       (complete)
t10. Rabbit start node3    (complete)
t11. Rabbit promote node 1 (begin)
t12. Rabbit promote node 3 (begin)
t13. Rabbit promote node 2 (begin)
... etc etc ...

For something like cinder however, these are some of the dependancies we define:

    pcs constraint order start keystone-clone then cinder-api-clone
    pcs constraint order start cinder-api-clone then cinder-scheduler-clone
    pcs constraint order start galera-master then keystone-clone

So first all the galera instances must be started. Then we can begin to promote some.
Once all the promotions complete, then we can start the keystone instances.
Once all the keystone instances are up, then we can bring up the cinder API instances, which allows us to start the scheduler, etc etc.

And assuming nothing fails, this can all happen in one transition.

Bottom line: Pacemaker will do as much as it can as soon as it can.  
The only restrictions are ordering constraints you specify, the batch-limit, and each master/slave (or clone) resource’s _internal_ demote->stop->start->promote ordering.

Am I making it better or worse?

> 
> Does this mean what an artificial constraints ordering between DB and
> messaging could help them to get into the separate transition batches, like:
> 
> ...
> 3. messaging multistate clone resource start
> 4. messaging multistate clone resource promote
> _batch end_
> 
> _next batch start_
> ...
> 3. DB simple clone resource start
> 
> ?
> 
> -- 
> Best regards,
> Bogdan Dobrelya,
> Skype #bogdando_at_yahoo.com
> Irc #bogdando
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Open Stack

[openstack-dev] [Fuel] Speed Up RabbitMQ Recovering

OpenStack

Community

Documentation

Branding & Legal