[openstack-dev] [Fuel][FFE] Disabling HA for RPC queues in RabbitMQ

Konstantin Kalin kkalin at mirantis.com
Wed Dec 2 14:52:22 UTC 2015


I would add on top of that Dmirty said that HA queues also increases probability to have messages duplications under certain scenarios (besides of that they are ~10x slower). Would Openstack services tolerate if RPC request will be duplicated? What I've already learned - No. Also if cluster_partition_handling=autoheal (what we currently have) the messages may be lost as well during the failover scenarios like non-HA queues. Honestly I believe there is no difference between HA queues and non HA-queues in RPC layer fail-tolerance in the way how we use RabbitMQ. 

Thank you,
Konstantin. 

> On Dec 2, 2015, at 4:05 AM, Dmitry Mescheryakov <dmescheryakov at mirantis.com> wrote:
> 
> 
> 
> 2015-12-02 12:48 GMT+03:00 Sergii Golovatiuk <sgolovatiuk at mirantis.com <mailto:sgolovatiuk at mirantis.com>>:
> Hi,
> 
> 
> On Tue, Dec 1, 2015 at 11:34 PM, Peter Lemenkov <lemenkov at gmail.com <mailto:lemenkov at gmail.com>> wrote:
> Hello All!
> 
> Well, side-effects (or any other effects) are quite obvious and
> predictable - this will decrease availability of RPC queues a bit.
> That's for sure.
> 
> Imagine the case when user creates VM instance, and some nova messages are lost. I am not sure we want half-created instances. Who is going to clean up them? Since we do not have results of destructive tests, I vote -2 for FFE for this feature.
> 
> Sergii, actually messaging layer can not provide any guarantee that it will not happen even if all messages are preserved. Assume the following scenario:
> 
>  * nova-scheduler (or conductor?) sends request to nova-compute to spawn a VM
>  * nova-compute receives the message and spawned the VM
>  * due to some reason (rabbitmq unavailable, nova-compute lagged) nova-compute did not respond within timeout (1 minute, I think)
>  * nova-scheduler does not get response within 1 minute and marks the VM with Error status.
> 
> In that scenario no message was lost, but still we have a VM half spawned and it is up to Nova to handle the error and do the cleanup in that case.
> 
> Such issue already happens here and there when something glitches. For instance our favorite MessagingTimeout exception could be caused by such scenario. Specifically, in that example when nova-scheduler times out waiting for reply, it will throw exactly that exception. 
> 
> My point is simple - lets increase our architecture scalability by 2-3 times by _maybe_ causing more errors for users during failover. The failover time itself should not get worse (to be tested by me) and errors should be correctly handler by services anyway.
> 
> 
> However, Dmitry's guess is that the overall messaging backplane
> stability increase (RabitMQ won't fail too often in some cases) would
> compensate for this change. This issue is very much real - speaking of
> me I've seen an awful cluster's performance degradation when a failing
> RabbitMQ node was killed by some watchdog application (or even worse
> wasn't killed at all). One of these issues was quite recently, and I'd
> love to see them less frequently.
> 
> That said I'm uncertain about the stability impact of this change, yet
> I see a reasoning worth discussing behind it.
> 
> 2015-12-01 20:53 GMT+01:00 Sergii Golovatiuk <sgolovatiuk at mirantis.com <mailto:sgolovatiuk at mirantis.com>>:
> > Hi,
> >
> > -1 for FFE for disabling HA for RPC queue as we do not know all side effects
> > in HA scenarios.
> >
> > On Tue, Dec 1, 2015 at 7:34 PM, Dmitry Mescheryakov
> > <dmescheryakov at mirantis.com <mailto:dmescheryakov at mirantis.com>> wrote:
> >>
> >> Folks,
> >>
> >> I would like to request feature freeze exception for disabling HA for RPC
> >> queues in RabbitMQ [1].
> >>
> >> As I already wrote in another thread [2], I've conducted tests which
> >> clearly show benefit we will get from that change. The change itself is a
> >> very small patch [3]. The only thing which I want to do before proposing to
> >> merge this change is to conduct destructive tests against it in order to
> >> make sure that we do not have a regression here. That should take just
> >> several days, so if there will be no other objections, we will be able to
> >> merge the change in a week or two timeframe.
> >>
> >> Thanks,
> >>
> >> Dmitry
> >>
> >> [1] https://review.openstack.org/247517 <https://review.openstack.org/247517>
> >> [2]
> >> http://lists.openstack.org/pipermail/openstack-dev/2015-December/081006.html <http://lists.openstack.org/pipermail/openstack-dev/2015-December/081006.html>
> >> [3] https://review.openstack.org/249180 <https://review.openstack.org/249180>
> >>
> >> __________________________________________________________________________
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe <http://OpenStack-dev-request@lists.openstack.org/?subject:unsubscribe>
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> >>
> >
> >
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe <http://OpenStack-dev-request@lists.openstack.org/?subject:unsubscribe>
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> >
> 
> 
> 
> --
> With best regards, Peter Lemenkov.
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe <http://OpenStack-dev-request@lists.openstack.org/?subject:unsubscribe>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe <http://OpenStack-dev-request@lists.openstack.org/?subject:unsubscribe>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151202/73f164eb/attachment.html>


More information about the OpenStack-dev mailing list