[openstack-dev] [Fuel][FFE] Disabling HA for RPC queues in RabbitMQ

Bogdan Dobrelya bdobrelia at mirantis.com
Wed Dec 2 10:11:31 UTC 2015


On 01.12.2015 23:34, Peter Lemenkov wrote:
> Hello All!
> 
> Well, side-effects (or any other effects) are quite obvious and
> predictable - this will decrease availability of RPC queues a bit.
> That's for sure.

And consistency. Without messages and queues being synced between all of
the rabbit_hosts, how exactly dispatching rpc calls would work then
workers connected to different AMQP urls?

Perhaps that change would only raise the partitions tolerance to the
very high degree? But this should be clearly shown by load tests - under
network partitions with mirroring against network partitions w/o
mirroring. Rally could help here a lot.

> 
> However, Dmitry's guess is that the overall messaging backplane
> stability increase (RabitMQ won't fail too often in some cases) would
> compensate for this change. This issue is very much real - speaking of

Agree, that should be proven by (rally) tests for the specific case I
described in the spec [0]. Please correct it as I may understand things
wrong, but here it is:
- client 1 submits RPC call request R to the server 1 connected to the
AMQP host X
- worker A listens for jobs topic to the AMQP host X
- worker B listens for jobs topic to the AMQP host Y
- a job by the R was dispatched to the worker B
Q: would the B never receive its job message because it just cannot see
messages at the X?
Q: timeout failure as the result.

And things may go even much more weird for more complex scenarios.


[0] https://review.openstack.org/247517

> me I've seen an awful cluster's performance degradation when a failing
> RabbitMQ node was killed by some watchdog application (or even worse
> wasn't killed at all). One of these issues was quite recently, and I'd
> love to see them less frequently.
> 
> That said I'm uncertain about the stability impact of this change, yet
> I see a reasoning worth discussing behind it.

I would support this to the 8.0 if only proven by the load tests within
scenario I described plus standard destructive tests

> 
> 2015-12-01 20:53 GMT+01:00 Sergii Golovatiuk <sgolovatiuk at mirantis.com>:
>> Hi,
>>
>> -1 for FFE for disabling HA for RPC queue as we do not know all side effects
>> in HA scenarios.
>>
>> On Tue, Dec 1, 2015 at 7:34 PM, Dmitry Mescheryakov
>> <dmescheryakov at mirantis.com> wrote:
>>>
>>> Folks,
>>>
>>> I would like to request feature freeze exception for disabling HA for RPC
>>> queues in RabbitMQ [1].
>>>
>>> As I already wrote in another thread [2], I've conducted tests which
>>> clearly show benefit we will get from that change. The change itself is a
>>> very small patch [3]. The only thing which I want to do before proposing to
>>> merge this change is to conduct destructive tests against it in order to
>>> make sure that we do not have a regression here. That should take just
>>> several days, so if there will be no other objections, we will be able to
>>> merge the change in a week or two timeframe.
>>>
>>> Thanks,
>>>
>>> Dmitry
>>>
>>> [1] https://review.openstack.org/247517
>>> [2]
>>> http://lists.openstack.org/pipermail/openstack-dev/2015-December/081006.html
>>> [3] https://review.openstack.org/249180
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> 
> 
> 


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando



More information about the OpenStack-dev mailing list