[openstack-dev] [release] release critical oslo.messaging changes

Sean Dague sean at dague.net
Fri Apr 17 13:54:21 UTC 2015


It turns out a number of people are hitting -
https://bugs.launchpad.net/oslo.messaging/+bug/1436769 (I tripped over
it this morning as well).

Under a currently unknown set of conditions you can get into a heartbeat
loop with oslo.messaging 1.8.1 which basically shuts down the RPC bus as
every service is heartbeat looping 100% of the time.

I had py-amqp < 1.4.0, and 1.4.0 seems to have a bug fix for one of the
issues here.

However, after chatting with silent in IRC this morning it sounded like
the safer option might be to disable the rabbit heartbeat by default,
because this sort of heartbeat storm can kill the entire OpenStack
environment, and is not really clear how you recover from it.

All of which is recorded in the bug.

Proposed actions are to do both of:

- oslo.messaging release with heartbeats off by default (simulates 1.8.0
behavior before the heartbeat code landed)
- oslo.messaging requiring py-amqp >= 1.4.0, so that if you enable the
heartbeating, at least you are protected from the known bug

This would still let operators use the feature, we'd consider it
experimental, until we're sure there aren't any other dragons hidden in
there. I think the goal would be to make it default on again for Marmoset.

	-Sean

-- 
Sean Dague
http://dague.net



More information about the OpenStack-dev mailing list