[openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

Andrew Beekhof abeekhof at redhat.com
Wed Nov 11 11:12:38 UTC 2015


> On 11 Nov 2015, at 6:26 PM, bdobrelia at mirantis.com wrote:
> 
> Thank you Andrew.
> Answers below.
> >>>
> Sounds interesting, can you give any comment about how it differs to the other[i] upstream agent?
> Am I right that this one is effectively A/P and wont function without some kind of shared storage?
> Any particular reason you went down this path instead of full A/A?
> 
> [i] 
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
> <<<
> It is based on multistate clone notifications. It requries nothing shared but Corosync info base CIB where all Pacemaker resources stored anyway.
> And it is fully A/A.

Oh!  So I should skip the A/P parts before "Auto-configuration of a cluster with a Pacemaker”? 
Is the idea that the master mode is for picking a node to bootstrap the cluster?

If so I don’t believe that should be necessary provided you specify ordered=true for the clone.
This allows you to assume in the agent that your instance is the only one currently changing state (by starting or stopping).
I notice that rabbitmq.com explicitly sets this to false… any particular reason?


Regarding the pcs command to create the resource, you can simplify it to:

pcs resource create --force --master p_rabbitmq-server ocf:rabbitmq:rabbitmq-server-ha \
  erlang_cookie=DPMDALGUKEOMPTHWPYKC node_port=5672 \
  op monitor interval=30 timeout=60 \
  op monitor interval=27 role=Master timeout=60 \
  op monitor interval=103 role=Slave timeout=60 OCF_CHECK_LEVEL=30 \
  meta notify=true ordered=false interleave=true master-max=1 master-node-max=1

If you update the stop/start/notify/promote/demote timeouts in the agent’s metadata.


Lines 1602,1565,1621,1632,1657, and 1678 have the notify command returning an error.
Was this logic tested? Because pacemaker does not currently support/allow notify actions to fail.
IIRC pacemaker simply ignores them.

Modifying the resource state in notifications is also highly unusual.
What was the reason for that?

I notice that on node down, this agent makes disconnect_node and forget_cluster_node calls.
The other upstream agent does not, do you have any information about the bad things that might happen as a result?

Basically I’m looking for what each option does differently/better with a view to converging on a single implementation. 
I don’t much care in which location it lives.

I’m CC’ing the other upstream maintainer, it would be good if you guys could have a chat :-)

> All running rabbit nodes may process AMQP connections. Master state is only for a cluster initial point at wich other slaves may join to it.
> Note, here you can find events flow charts as well [0]
> [0] https://www.rabbitmq.com/pacemaker.html
> Regards,
> Bogdan
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list