[openstack-dev] [Fuel][FFE] Disabling HA for RPC queues in RabbitMQ

Vladimir Kuklin vkuklin at mirantis.com
Wed Dec 2 15:08:52 UTC 2015


Dmitry

Although, I am a big fan of disabling replication for RPC, I think it is
too late to introduce it so late by default. I would suggest that we
control this part of OCF script with a specific parameter 'e.g. enable RPC
replication' and set it to 'true' by default. Then we can set this option
to false as an experimental feature, run some tests and decide whether it
should be enabled by default or not. In this case, users who are interested
in this, will be able to enable it when they need it, while we still stick
to our old and tested approach.

On Wed, Dec 2, 2015 at 5:52 PM, Konstantin Kalin <kkalin at mirantis.com>
wrote:

> I would add on top of that Dmirty said that HA queues also increases
> probability to have messages duplications under certain scenarios (besides
> of that they are ~10x slower). Would Openstack services tolerate if RPC
> request will be duplicated? What I've already learned - No. Also if
> cluster_partition_handling=autoheal (what we currently have) the messages
> may be lost as well during the failover scenarios like non-HA
> queues. Honestly I believe there is no difference between HA queues and non
> HA-queues in RPC layer fail-tolerance in the way how we use RabbitMQ.
>
> Thank you,
> Konstantin.
>
> On Dec 2, 2015, at 4:05 AM, Dmitry Mescheryakov <
> dmescheryakov at mirantis.com> wrote:
>
>
>
> 2015-12-02 12:48 GMT+03:00 Sergii Golovatiuk <sgolovatiuk at mirantis.com>:
>
>> Hi,
>>
>>
>> On Tue, Dec 1, 2015 at 11:34 PM, Peter Lemenkov <lemenkov at gmail.com>
>> wrote:
>>
>>> Hello All!
>>>
>>> Well, side-effects (or any other effects) are quite obvious and
>>> predictable - this will decrease availability of RPC queues a bit.
>>> That's for sure.
>>>
>>
>> Imagine the case when user creates VM instance, and some nova messages
>> are lost. I am not sure we want half-created instances. Who is going to
>> clean up them? Since we do not have results of destructive tests, I vote -2
>> for FFE for this feature.
>>
>
> Sergii, actually messaging layer can not provide any guarantee that it
> will not happen even if all messages are preserved. Assume the following
> scenario:
>
>  * nova-scheduler (or conductor?) sends request to nova-compute to spawn a
> VM
>  * nova-compute receives the message and spawned the VM
>  * due to some reason (rabbitmq unavailable, nova-compute lagged)
> nova-compute did not respond within timeout (1 minute, I think)
>  * nova-scheduler does not get response within 1 minute and marks the VM
> with Error status.
>
> In that scenario no message was lost, but still we have a VM half spawned
> and it is up to Nova to handle the error and do the cleanup in that case.
>
> Such issue already happens here and there when something glitches. For
> instance our favorite MessagingTimeout exception could be caused by such
> scenario. Specifically, in that example when nova-scheduler times out
> waiting for reply, it will throw exactly that exception.
>
> My point is simple - lets increase our architecture scalability by 2-3
> times by _maybe_ causing more errors for users during failover. The
> failover time itself should not get worse (to be tested by me) and errors
> should be correctly handler by services anyway.
>
>
>>> However, Dmitry's guess is that the overall messaging backplane
>>> stability increase (RabitMQ won't fail too often in some cases) would
>>> compensate for this change. This issue is very much real - speaking of
>>> me I've seen an awful cluster's performance degradation when a failing
>>> RabbitMQ node was killed by some watchdog application (or even worse
>>> wasn't killed at all). One of these issues was quite recently, and I'd
>>> love to see them less frequently.
>>>
>>> That said I'm uncertain about the stability impact of this change, yet
>>> I see a reasoning worth discussing behind it.
>>>
>>> 2015-12-01 20:53 GMT+01:00 Sergii Golovatiuk <sgolovatiuk at mirantis.com>:
>>> > Hi,
>>> >
>>> > -1 for FFE for disabling HA for RPC queue as we do not know all side
>>> effects
>>> > in HA scenarios.
>>> >
>>> > On Tue, Dec 1, 2015 at 7:34 PM, Dmitry Mescheryakov
>>> > <dmescheryakov at mirantis.com> wrote:
>>> >>
>>> >> Folks,
>>> >>
>>> >> I would like to request feature freeze exception for disabling HA for
>>> RPC
>>> >> queues in RabbitMQ [1].
>>> >>
>>> >> As I already wrote in another thread [2], I've conducted tests which
>>> >> clearly show benefit we will get from that change. The change itself
>>> is a
>>> >> very small patch [3]. The only thing which I want to do before
>>> proposing to
>>> >> merge this change is to conduct destructive tests against it in order
>>> to
>>> >> make sure that we do not have a regression here. That should take just
>>> >> several days, so if there will be no other objections, we will be
>>> able to
>>> >> merge the change in a week or two timeframe.
>>> >>
>>> >> Thanks,
>>> >>
>>> >> Dmitry
>>> >>
>>> >> [1] https://review.openstack.org/247517
>>> >> [2]
>>> >>
>>> http://lists.openstack.org/pipermail/openstack-dev/2015-December/081006.html
>>> >> [3] https://review.openstack.org/249180
>>> >>
>>> >>
>>> __________________________________________________________________________
>>> >> OpenStack Development Mailing List (not for usage questions)
>>> >> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> <http://OpenStack-dev-request@lists.openstack.org/?subject:unsubscribe>
>>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> >>
>>> >
>>> >
>>> >
>>> __________________________________________________________________________
>>> > OpenStack Development Mailing List (not for usage questions)
>>> > Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> <http://OpenStack-dev-request@lists.openstack.org/?subject:unsubscribe>
>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> >
>>>
>>>
>>>
>>> --
>>> With best regards, Peter Lemenkov.
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> <http://OpenStack-dev-request@lists.openstack.org/?subject:unsubscribe>
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> <http://OpenStack-dev-request@lists.openstack.org/?subject:unsubscribe>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 
Yours Faithfully,
Vladimir Kuklin,
Fuel Library Tech Lead,
Mirantis, Inc.
+7 (495) 640-49-04
+7 (926) 702-39-68
Skype kuklinvv
35bk3, Vorontsovskaya Str.
Moscow, Russia,
www.mirantis.com <http://www.mirantis.ru/>
www.mirantis.ru
vkuklin at mirantis.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151202/f8632e75/attachment.html>


More information about the OpenStack-dev mailing list