[openstack-dev] [Fuel][FFE] Disabling HA for RPC queues in RabbitMQ

Davanum Srinivas davanum at gmail.com
Wed Dec 2 15:31:45 UTC 2015


Vova, Folks,

+1 to "set this option to false as an experimental feature"

Thanks,
Dims

On Wed, Dec 2, 2015 at 10:08 AM, Vladimir Kuklin <vkuklin at mirantis.com> wrote:
> Dmitry
>
> Although, I am a big fan of disabling replication for RPC, I think it is too
> late to introduce it so late by default. I would suggest that we control
> this part of OCF script with a specific parameter 'e.g. enable RPC
> replication' and set it to 'true' by default. Then we can set this option to
> false as an experimental feature, run some tests and decide whether it
> should be enabled by default or not. In this case, users who are interested
> in this, will be able to enable it when they need it, while we still stick
> to our old and tested approach.
>
> On Wed, Dec 2, 2015 at 5:52 PM, Konstantin Kalin <kkalin at mirantis.com>
> wrote:
>>
>> I would add on top of that Dmirty said that HA queues also increases
>> probability to have messages duplications under certain scenarios (besides
>> of that they are ~10x slower). Would Openstack services tolerate if RPC
>> request will be duplicated? What I've already learned - No. Also if
>> cluster_partition_handling=autoheal (what we currently have) the messages
>> may be lost as well during the failover scenarios like non-HA queues.
>> Honestly I believe there is no difference between HA queues and non
>> HA-queues in RPC layer fail-tolerance in the way how we use RabbitMQ.
>>
>> Thank you,
>> Konstantin.
>>
>> On Dec 2, 2015, at 4:05 AM, Dmitry Mescheryakov
>> <dmescheryakov at mirantis.com> wrote:
>>
>>
>>
>> 2015-12-02 12:48 GMT+03:00 Sergii Golovatiuk <sgolovatiuk at mirantis.com>:
>>>
>>> Hi,
>>>
>>>
>>> On Tue, Dec 1, 2015 at 11:34 PM, Peter Lemenkov <lemenkov at gmail.com>
>>> wrote:
>>>>
>>>> Hello All!
>>>>
>>>> Well, side-effects (or any other effects) are quite obvious and
>>>> predictable - this will decrease availability of RPC queues a bit.
>>>> That's for sure.
>>>
>>>
>>> Imagine the case when user creates VM instance, and some nova messages
>>> are lost. I am not sure we want half-created instances. Who is going to
>>> clean up them? Since we do not have results of destructive tests, I vote -2
>>> for FFE for this feature.
>>
>>
>> Sergii, actually messaging layer can not provide any guarantee that it
>> will not happen even if all messages are preserved. Assume the following
>> scenario:
>>
>>  * nova-scheduler (or conductor?) sends request to nova-compute to spawn a
>> VM
>>  * nova-compute receives the message and spawned the VM
>>  * due to some reason (rabbitmq unavailable, nova-compute lagged)
>> nova-compute did not respond within timeout (1 minute, I think)
>>  * nova-scheduler does not get response within 1 minute and marks the VM
>> with Error status.
>>
>> In that scenario no message was lost, but still we have a VM half spawned
>> and it is up to Nova to handle the error and do the cleanup in that case.
>>
>> Such issue already happens here and there when something glitches. For
>> instance our favorite MessagingTimeout exception could be caused by such
>> scenario. Specifically, in that example when nova-scheduler times out
>> waiting for reply, it will throw exactly that exception.
>>
>> My point is simple - lets increase our architecture scalability by 2-3
>> times by _maybe_ causing more errors for users during failover. The failover
>> time itself should not get worse (to be tested by me) and errors should be
>> correctly handler by services anyway.
>>
>>>>
>>>> However, Dmitry's guess is that the overall messaging backplane
>>>> stability increase (RabitMQ won't fail too often in some cases) would
>>>> compensate for this change. This issue is very much real - speaking of
>>>> me I've seen an awful cluster's performance degradation when a failing
>>>> RabbitMQ node was killed by some watchdog application (or even worse
>>>> wasn't killed at all). One of these issues was quite recently, and I'd
>>>> love to see them less frequently.
>>>>
>>>> That said I'm uncertain about the stability impact of this change, yet
>>>> I see a reasoning worth discussing behind it.
>>>>
>>>> 2015-12-01 20:53 GMT+01:00 Sergii Golovatiuk <sgolovatiuk at mirantis.com>:
>>>> > Hi,
>>>> >
>>>> > -1 for FFE for disabling HA for RPC queue as we do not know all side
>>>> > effects
>>>> > in HA scenarios.
>>>> >
>>>> > On Tue, Dec 1, 2015 at 7:34 PM, Dmitry Mescheryakov
>>>> > <dmescheryakov at mirantis.com> wrote:
>>>> >>
>>>> >> Folks,
>>>> >>
>>>> >> I would like to request feature freeze exception for disabling HA for
>>>> >> RPC
>>>> >> queues in RabbitMQ [1].
>>>> >>
>>>> >> As I already wrote in another thread [2], I've conducted tests which
>>>> >> clearly show benefit we will get from that change. The change itself
>>>> >> is a
>>>> >> very small patch [3]. The only thing which I want to do before
>>>> >> proposing to
>>>> >> merge this change is to conduct destructive tests against it in order
>>>> >> to
>>>> >> make sure that we do not have a regression here. That should take
>>>> >> just
>>>> >> several days, so if there will be no other objections, we will be
>>>> >> able to
>>>> >> merge the change in a week or two timeframe.
>>>> >>
>>>> >> Thanks,
>>>> >>
>>>> >> Dmitry
>>>> >>
>>>> >> [1] https://review.openstack.org/247517
>>>> >> [2]
>>>> >>
>>>> >> http://lists.openstack.org/pipermail/openstack-dev/2015-December/081006.html
>>>> >> [3] https://review.openstack.org/249180
>>>> >>
>>>> >>
>>>> >> __________________________________________________________________________
>>>> >> OpenStack Development Mailing List (not for usage questions)
>>>> >> Unsubscribe:
>>>> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > __________________________________________________________________________
>>>> > OpenStack Development Mailing List (not for usage questions)
>>>> > Unsubscribe:
>>>> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> With best regards, Peter Lemenkov.
>>>>
>>>>
>>>> __________________________________________________________________________
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe:
>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
>
> --
> Yours Faithfully,
> Vladimir Kuklin,
> Fuel Library Tech Lead,
> Mirantis, Inc.
> +7 (495) 640-49-04
> +7 (926) 702-39-68
> Skype kuklinvv
> 35bk3, Vorontsovskaya Str.
> Moscow, Russia,
> www.mirantis.com
> www.mirantis.ru
> vkuklin at mirantis.com
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Davanum Srinivas :: https://twitter.com/dims



More information about the OpenStack-dev mailing list