[openstack-dev] [Oslo] [Oslo.messaging] RPC failover handling in rabbitmq driver

Bogdan Dobrelya bdobrelia at mirantis.com
Wed Jul 30 15:31:31 UTC 2014


On 07/28/2014 11:20 AM, Bogdan Dobrelya wrote:
> Hello.
> I'd like to bring your attention to major RPC failover issue in
> impl_rabbit.py [0]. There are several *related* patches and a number of
> concerns should be considered as well:
> - Passive exchanges fix [1] (looks like the problem is much deeper than
> it seems though).
> - the first version of the fix [2] which makes the producer to declare a
> queue and bind it to exchange as well as consumer does.
> - Making all RPC involved reply_* queues durable in order to preserve
> them in RabbitMQ after failover (there could be a TTL for such a queues
> as well)
> - RPC throughput tuning patch [3]
> 
> I believe the issue [0] should be at least prioritized and assigned to
> some milestone.
> 
> [0] https://bugs.launchpad.net/oslo.messaging/+bug/1338732
> [1] https://review.openstack.org/#/c/109373/
> [2]
> https://github.com/noelbk/oslo.messaging/commit/960fc26ff050ca3073ad90eccbef1ca95712e82e
> [3] https://review.openstack.org/#/c/109143/
> 

There is a small update for this RabbitMQ RPC failover research:
Stan Lagun submitted the patch [0] for related bug [1].
Please don't hesitate to join the review process.

Basically the idea of the patch is to address the "step 3"
(rabbit dies and restarts) for *mirrored rabbit clusters*.
Obviously, it changes nothing for single rabbit host case because we
cannot "failover" then we have no cluster.

I agree the issue is more common than just impl_rabbit, but at least we
could start addressing it from here.

Speaking in general, it looks like RPC should be standardized more
thoroughly, may be as a some new RFC, and it should provide a rules
  a) how to handle AMQP connection HA failovers at RPC layer both for
drivers and applications, both for client and server side (speaking in
terms of RPC)
  b) how to handle RPC retries in a single AMQP host configurations and
in HA as well.
That would also have allowed amqp driver developers to borrow some logic
from app layers, if needed (and vice versa for app developers) w/o
causing a havoc and sorrow as we have now in oslo.messaging :-)

[0] https://review.openstack.org/110058
[1] https://bugs.launchpad.net/oslo.messaging/+bug/1349301

-- 
Best regards,
Bogdan Dobrelya,
Skype #bogdando_at_yahoo.com
Irc #bogdando



More information about the OpenStack-dev mailing list