[openstack-dev] [Fuel][FFE] Disabling HA for RPC queues in RabbitMQ

Sheena Gregson sgregson at mirantis.com
Wed Dec 2 15:40:54 UTC 2015


This seems like a totally reasonable solution, and would enable us to more
thoroughly test the performance implications of this change between 8.0
and 9.0 release.

+1

-----Original Message-----
From: Davanum Srinivas [mailto:davanum at gmail.com]
Sent: Wednesday, December 02, 2015 9:32 AM
To: OpenStack Development Mailing List (not for usage questions)
<openstack-dev at lists.openstack.org>
Subject: Re: [openstack-dev] [Fuel][FFE] Disabling HA for RPC queues in
RabbitMQ

Vova, Folks,

+1 to "set this option to false as an experimental feature"

Thanks,
Dims

On Wed, Dec 2, 2015 at 10:08 AM, Vladimir Kuklin <vkuklin at mirantis.com>
wrote:
> Dmitry
>
> Although, I am a big fan of disabling replication for RPC, I think it
> is too late to introduce it so late by default. I would suggest that
> we control this part of OCF script with a specific parameter 'e.g.
> enable RPC replication' and set it to 'true' by default. Then we can
> set this option to false as an experimental feature, run some tests
> and decide whether it should be enabled by default or not. In this
> case, users who are interested in this, will be able to enable it when
> they need it, while we still stick to our old and tested approach.
>
> On Wed, Dec 2, 2015 at 5:52 PM, Konstantin Kalin <kkalin at mirantis.com>
> wrote:
>>
>> I would add on top of that Dmirty said that HA queues also increases
>> probability to have messages duplications under certain scenarios
>> (besides of that they are ~10x slower). Would Openstack services
>> tolerate if RPC request will be duplicated? What I've already learned
>> - No. Also if cluster_partition_handling=autoheal (what we currently
>> have) the messages may be lost as well during the failover scenarios
like non-HA queues.
>> Honestly I believe there is no difference between HA queues and non
>> HA-queues in RPC layer fail-tolerance in the way how we use RabbitMQ.
>>
>> Thank you,
>> Konstantin.
>>
>> On Dec 2, 2015, at 4:05 AM, Dmitry Mescheryakov
>> <dmescheryakov at mirantis.com> wrote:
>>
>>
>>
>> 2015-12-02 12:48 GMT+03:00 Sergii Golovatiuk
<sgolovatiuk at mirantis.com>:
>>>
>>> Hi,
>>>
>>>
>>> On Tue, Dec 1, 2015 at 11:34 PM, Peter Lemenkov <lemenkov at gmail.com>
>>> wrote:
>>>>
>>>> Hello All!
>>>>
>>>> Well, side-effects (or any other effects) are quite obvious and
>>>> predictable - this will decrease availability of RPC queues a bit.
>>>> That's for sure.
>>>
>>>
>>> Imagine the case when user creates VM instance, and some nova
>>> messages are lost. I am not sure we want half-created instances. Who
>>> is going to clean up them? Since we do not have results of
>>> destructive tests, I vote -2 for FFE for this feature.
>>
>>
>> Sergii, actually messaging layer can not provide any guarantee that
>> it will not happen even if all messages are preserved. Assume the
>> following
>> scenario:
>>
>>  * nova-scheduler (or conductor?) sends request to nova-compute to
>> spawn a VM
>>  * nova-compute receives the message and spawned the VM
>>  * due to some reason (rabbitmq unavailable, nova-compute lagged)
>> nova-compute did not respond within timeout (1 minute, I think)
>>  * nova-scheduler does not get response within 1 minute and marks the
>> VM with Error status.
>>
>> In that scenario no message was lost, but still we have a VM half
>> spawned and it is up to Nova to handle the error and do the cleanup in
that case.
>>
>> Such issue already happens here and there when something glitches.
>> For instance our favorite MessagingTimeout exception could be caused
>> by such scenario. Specifically, in that example when nova-scheduler
>> times out waiting for reply, it will throw exactly that exception.
>>
>> My point is simple - lets increase our architecture scalability by
>> 2-3 times by _maybe_ causing more errors for users during failover.
>> The failover time itself should not get worse (to be tested by me)
>> and errors should be correctly handler by services anyway.
>>
>>>>
>>>> However, Dmitry's guess is that the overall messaging backplane
>>>> stability increase (RabitMQ won't fail too often in some cases)
>>>> would compensate for this change. This issue is very much real -
>>>> speaking of me I've seen an awful cluster's performance degradation
>>>> when a failing RabbitMQ node was killed by some watchdog
>>>> application (or even worse wasn't killed at all). One of these
>>>> issues was quite recently, and I'd love to see them less frequently.
>>>>
>>>> That said I'm uncertain about the stability impact of this change,
>>>> yet I see a reasoning worth discussing behind it.
>>>>
>>>> 2015-12-01 20:53 GMT+01:00 Sergii Golovatiuk
<sgolovatiuk at mirantis.com>:
>>>> > Hi,
>>>> >
>>>> > -1 for FFE for disabling HA for RPC queue as we do not know all
>>>> > side effects in HA scenarios.
>>>> >
>>>> > On Tue, Dec 1, 2015 at 7:34 PM, Dmitry Mescheryakov
>>>> > <dmescheryakov at mirantis.com> wrote:
>>>> >>
>>>> >> Folks,
>>>> >>
>>>> >> I would like to request feature freeze exception for disabling
>>>> >> HA for RPC queues in RabbitMQ [1].
>>>> >>
>>>> >> As I already wrote in another thread [2], I've conducted tests
>>>> >> which clearly show benefit we will get from that change. The
>>>> >> change itself is a very small patch [3]. The only thing which I
>>>> >> want to do before proposing to merge this change is to conduct
>>>> >> destructive tests against it in order to make sure that we do
>>>> >> not have a regression here. That should take just several days,
>>>> >> so if there will be no other objections, we will be able to
>>>> >> merge the change in a week or two timeframe.
>>>> >>
>>>> >> Thanks,
>>>> >>
>>>> >> Dmitry
>>>> >>
>>>> >> [1] https://review.openstack.org/247517
>>>> >> [2]
>>>> >>
>>>> >> http://lists.openstack.org/pipermail/openstack-dev/2015-December
>>>> >> /081006.html [3] https://review.openstack.org/249180
>>>> >>
>>>> >>
>>>> >> ________________________________________________________________
>>>> >> __________ OpenStack Development Mailing List (not for usage
>>>> >> questions)
>>>> >> Unsubscribe:
>>>> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-de
>>>> >> v
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > _________________________________________________________________
>>>> > _________ OpenStack Development Mailing List (not for usage
>>>> > questions)
>>>> > Unsubscribe:
>>>> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> With best regards, Peter Lemenkov.
>>>>
>>>>
>>>> ___________________________________________________________________
>>>> _______ OpenStack Development Mailing List (not for usage
>>>> questions)
>>>> Unsubscribe:
>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>>
>>>
>>> ____________________________________________________________________
>>> ______ OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>> _____________________________________________________________________
>> _____ OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> _____________________________________________________________________
>> _____ OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
>
> --
> Yours Faithfully,
> Vladimir Kuklin,
> Fuel Library Tech Lead,
> Mirantis, Inc.
> +7 (495) 640-49-04
> +7 (926) 702-39-68
> Skype kuklinvv
> 35bk3, Vorontsovskaya Str.
> Moscow, Russia,
> www.mirantis.com
> www.mirantis.ru
> vkuklin at mirantis.com
>
> ______________________________________________________________________
> ____ OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



--
Davanum Srinivas :: https://twitter.com/dims

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list