[openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

Vladimir Kuklin vkuklin at mirantis.com
Thu Nov 12 11:44:07 UTC 2015


Hi, Andrew

>Ah good, I understood it correctly then :)
> I would be interested in your opinion of how the other agent does the
bootstrapping (ie. without notifications or master/slave).
>That makes sense, the part I’m struggling with is that it sounds like the
other agent shouldn’t work at all.
> Yet we’ve used it extensively and not experienced these kinds of hangs.
Regarding other scripts - I am not aware of any other scripts that actually
handle cloned rabbitmq server. I may be mistaking, of course. So if you are
aware if these scripts succeed in creating rabbitmq cluster which actually
survives 1-node or all-node failure scenarios and reassembles the cluster
automatically - please, let us know.

> Changing the state isn’t ideal but there is precedent, the part that has
me concerned is the error codes coming out of notify.
> Apart from producing some log messages, I can’t think how it would
produce any recovery.

> Unless you’re relying on the subsequent monitor operation to notice the
error state.
> I guess that would work but you might be waiting a while for it to notice.

Yes, we are relying on subsequent monitor operations. We also have several
OCF check levels to catch a case when one node does not have rabbitmq
application started properly (btw, there was a strange bug that we had to
wait for several non-zero checks to fail to get the resource to restart
http://bugs.clusterlabs.org/show_bug.cgi?id=5243) . I now remember, why we
did notify errors - for error logging, I guess.


On Thu, Nov 12, 2015 at 1:30 AM, Andrew Beekhof <abeekhof at redhat.com> wrote:

>
> > On 11 Nov 2015, at 11:35 PM, Vladimir Kuklin <vkuklin at mirantis.com>
> wrote:
> >
> > Hi, Andrew
> >
> > Let me answer your questions.
> >
> > This agent is active/active which actually marks one of the node as
> 'pseudo'-master which is used as a target for other nodes to join to. We
> also check which node is a master and use it in monitor action to check
> whether this node is clustered with this 'master' node. When we do cluster
> bootstrap, we need to decide which node to mark as a master node. Then,
> when it starts (actually, promotes), we can finally pick its name through
> notification mechanism and ask other nodes to join this cluster.
>
> Ah good, I understood it correctly then :)
> I would be interested in your opinion of how the other agent does the
> bootstrapping (ie. without notifications or master/slave).
>
> >
> > Regarding disconnect_node+forget_cluster_node this is quite simple - we
> need to eject node from the cluster. Otherwise it is mentioned in the list
> of cluster nodes and a lot of cluster actions, e.g. list_queues, will hang
> forever as well as forget_cluster_node action.
>
> That makes sense, the part I’m struggling with is that it sounds like the
> other agent shouldn’t work at all.
> Yet we’ve used it extensively and not experienced these kinds of hangs.
>
> >
> > We also handle this case whenever a node leaves the cluster. If you
> remember, I wrote an email to Pacemaker ML regarding getting notifications
> on node unjoin event '[openstack-dev] [Fuel][Pacemaker][HA] Notifying
> clones of offline nodes’.
>
> Oh, I recall that now.
>
> > So we went another way and added a dbus daemon listener that does the
> same when node lefts corosync cluster (we know that this is a little bit
> racy, but disconnect+forget actions pair is idempotent).
> >
> > Regarding notification commands - we changed behaviour to the one that
> fitter our use cases better and passed our destructive tests. It could be
> Pacemaker-version dependent, so I agree we should consider changing this
> behaviour. But so far it worked for us.
>
> Changing the state isn’t ideal but there is precedent, the part that has
> me concerned is the error codes coming out of notify.
> Apart from producing some log messages, I can’t think how it would produce
> any recovery.
>
> Unless you’re relying on the subsequent monitor operation to notice the
> error state.
> I guess that would work but you might be waiting a while for it to notice.
>
> >
> > On Wed, Nov 11, 2015 at 2:12 PM, Andrew Beekhof <abeekhof at redhat.com>
> wrote:
> >
> > > On 11 Nov 2015, at 6:26 PM, bdobrelia at mirantis.com wrote:
> > >
> > > Thank you Andrew.
> > > Answers below.
> > > >>>
> > > Sounds interesting, can you give any comment about how it differs to
> the other[i] upstream agent?
> > > Am I right that this one is effectively A/P and wont function without
> some kind of shared storage?
> > > Any particular reason you went down this path instead of full A/A?
> > >
> > > [i]
> > >
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
> > > <<<
> > > It is based on multistate clone notifications. It requries nothing
> shared but Corosync info base CIB where all Pacemaker resources stored
> anyway.
> > > And it is fully A/A.
> >
> > Oh!  So I should skip the A/P parts before "Auto-configuration of a
> cluster with a Pacemaker”?
> > Is the idea that the master mode is for picking a node to bootstrap the
> cluster?
> >
> > If so I don’t believe that should be necessary provided you specify
> ordered=true for the clone.
> > This allows you to assume in the agent that your instance is the only
> one currently changing state (by starting or stopping).
> > I notice that rabbitmq.com explicitly sets this to false… any
> particular reason?
> >
> >
> > Regarding the pcs command to create the resource, you can simplify it to:
> >
> > pcs resource create --force --master p_rabbitmq-server
> ocf:rabbitmq:rabbitmq-server-ha \
> >   erlang_cookie=DPMDALGUKEOMPTHWPYKC node_port=5672 \
> >   op monitor interval=30 timeout=60 \
> >   op monitor interval=27 role=Master timeout=60 \
> >   op monitor interval=103 role=Slave timeout=60 OCF_CHECK_LEVEL=30 \
> >   meta notify=true ordered=false interleave=true master-max=1
> master-node-max=1
> >
> > If you update the stop/start/notify/promote/demote timeouts in the
> agent’s metadata.
> >
> >
> > Lines 1602,1565,1621,1632,1657, and 1678 have the notify command
> returning an error.
> > Was this logic tested? Because pacemaker does not currently
> support/allow notify actions to fail.
> > IIRC pacemaker simply ignores them.
> >
> > Modifying the resource state in notifications is also highly unusual.
> > What was the reason for that?
> >
> > I notice that on node down, this agent makes disconnect_node and
> forget_cluster_node calls.
> > The other upstream agent does not, do you have any information about the
> bad things that might happen as a result?
> >
> > Basically I’m looking for what each option does differently/better with
> a view to converging on a single implementation.
> > I don’t much care in which location it lives.
> >
> > I’m CC’ing the other upstream maintainer, it would be good if you guys
> could have a chat :-)
> >
> > > All running rabbit nodes may process AMQP connections. Master state is
> only for a cluster initial point at wich other slaves may join to it.
> > > Note, here you can find events flow charts as well [0]
> > > [0] https://www.rabbitmq.com/pacemaker.html
> > > Regards,
> > > Bogdan
> > >
> __________________________________________________________________________
> > > OpenStack Development Mailing List (not for usage questions)
> > > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> > --
> > Yours Faithfully,
> > Vladimir Kuklin,
> > Fuel Library Tech Lead,
> > Mirantis, Inc.
> > +7 (495) 640-49-04
> > +7 (926) 702-39-68
> > Skype kuklinvv
> > 35bk3, Vorontsovskaya Str.
> > Moscow, Russia,
> > www.mirantis.com
> > www.mirantis.ru
> > vkuklin at mirantis.com
> >
> __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Yours Faithfully,
Vladimir Kuklin,
Fuel Library Tech Lead,
Mirantis, Inc.
+7 (495) 640-49-04
+7 (926) 702-39-68
Skype kuklinvv
35bk3, Vorontsovskaya Str.
Moscow, Russia,
www.mirantis.com <http://www.mirantis.ru/>
www.mirantis.ru
vkuklin at mirantis.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151112/fb845268/attachment.html>


More information about the OpenStack-dev mailing list