[Openstack-operators] Way to check compute <-> rabbitmq connectivity

sridhar basam sridhar.basam at gmail.com
Thu Jan 15 23:55:07 UTC 2015


If you are using ha queues, use a version of rabbitmq > 3.3.0. There was a
change in that version where consumption on queues was automatically
enabled when a master election for a queue happened. Previous versions only
informed clients that they had to reconsume on a queue. It was the clients
responsibility to start consumption on a queue.

Make sure you enable tcp keepalives to a low enough value in case you have
a firewall device in between your rabbit server and it's consumers.

Monitor consumers on your rabbit infrastructure using 'rabbitmqctl
list_queues name messages consumers'. Consumers on fanout queues is going
to depend on the number of services of any type you have in your
environment.

Sri
 On Jan 15, 2015 6:27 PM, "Michael Dorman" <mdorman at godaddy.com> wrote:

>   Here is the bug I’ve been tracking related to this for a while.  I
> haven’t really kept up to speed with it, so I don’t know the current status.
>
>  https://bugs.launchpad.net/nova/+bug/856764
>
>
>   From: Kris Lindgren <klindgren at godaddy.com>
> Date: Thursday, January 15, 2015 at 12:10 PM
> To: Gustavo Randich <gustavo.randich at gmail.com>, OpenStack Operators <
> openstack-operators at lists.openstack.org>
> Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq
> connectivity
>
>   During the Atlanta ops meeting this topic came up and I specifically
> mentioned about adding a "no-op" or healthcheck ping to the rabbitmq stuff
> to both nova & neutron.  The dev's in the room looked at me like I was
> crazy, but it was so that we could exactly catch issues as you described.
> I am also interested if any one knows of a lightweight call that could be
> used to verify/confirm rabbitmq connectivity as well.  I haven't been able
> to devote time to dig into it.  Mainly because if one client is having
> issues - you will notice other clients are having similar/silent errors and
> a restart of all the things is the easiest way to fix, for us atleast.
>  ____________________________________________
>
> Kris Lindgren
> Senior Linux Systems Engineer
> GoDaddy, LLC.
>
>
>   From: Gustavo Randich <gustavo.randich at gmail.com>
> Date: Thursday, January 15, 2015 at 11:53 AM
> To: "openstack-operators at lists.openstack.org" <
> openstack-operators at lists.openstack.org>
> Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq
> connectivity
>
>    Just to add one more background scenario, we also had similar problems
> trying to load balance rabbitmq via F5 Big IP LTM. For that reason we don't
> use it now. Our installation is a single rabbitmq instance and no
> intermediaries (albeit network switches). We use Folsom and Icehouse, the
> problem being perceived more in Icehouse nodes.
>
>  We are already monitoring message queue size, but we would like to
> pinpoint in semi-realtime the specific hosts/racks/network paths
> experiencing the "stale connection" before a user complains about an
> operation being stuck, or even hosts with no such pending operations but
> already "disconnected" -- we also could diagnose possible network causes
> and avoid massive service restarting.
>
>  So, for now, if someone knows about a cheap and quick openstack
> operation that triggers a message interchange between rabbitmq and
> nova-compute and a way of checking the result it would be great.
>
>
>
>
> On Thu, Jan 15, 2015 at 1:45 PM, Kris G. Lindgren <klindgren at godaddy.com>
> wrote:
>
>>  We did have an issue using celery  on an internal application that we
>> wrote - but I believe it was fixed after much failover testing and code
>> changes.  We also use logstash via rabbitmq and haven't noticed any issues
>> there either.
>>
>>  So this seems to be just openstack/oslo related.
>>
>>  We have tried a number of different configurations - all of them had
>> their issues.  We started out listing all the members in the cluster on the
>> rabbit_hosts line.  This worked most of the time without issue, until we
>> would restart one of the servers, then it seemed like the clients wouldn't
>> figure out they were disconnected and reconnect to the next host.
>>
>>  In an attempt to solve that we moved to using harpoxy to present a vip
>> that we configured in the rabbit_hosts line.  This created issues with long
>> lived connections disconnects and a bunch of other issues.  In our
>> production environment we moved to load balanced rabbitmq, but using a real
>> loadbalancer, and don’t have the weird disconnect issues.  However, anytime
>> we reboot/take down a rabbitmq host or pull a member from the cluster we
>> have issues, or if their is a network disruption we also have issues.
>>
>>  Thinking the best course of action is to move rabbitmq off on to its
>> own box and to leave it alone.
>>
>>  Does anyone have a rabbitmq setup that works well and doesn’t have
>> random issues when pulling nodes for maintenance?
>>   ____________________________________________
>>
>> Kris Lindgren
>> Senior Linux Systems Engineer
>> GoDaddy, LLC.
>>
>>
>>   From: Joe Topjian <joe at topjian.net>
>> Date: Thursday, January 15, 2015 at 9:29 AM
>> To: "Kris G. Lindgren" <klindgren at godaddy.com>
>> Cc: "openstack-operators at lists.openstack.org" <
>> openstack-operators at lists.openstack.org>
>> Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq
>> connectivity
>>
>>   Hi Kris,
>>
>>     Our experience is pretty much the same on anything that is using
>>> rabbitmq - not just nova-compute.
>>>
>>
>>  Just to clarify: have you experienced this outside of OpenStack (or
>> Oslo)?
>>
>>  We've seen similar issues with rabbitmq and OpenStack. We used to run
>> rabbit through haproxy and tried a myriad of options like setting no
>> timeouts, very very long timeouts, etc, but would always eventually see
>> similar issues as described.
>>
>>  Last month, we reconfigured all OpenStack components to use the
>> `rabbit_hosts` option with all nodes in our cluster listed. So far this has
>> worked well, though I probably just jinxed myself. :)
>>
>>  We still have other services (like Sensu) using the same rabbitmq
>> cluster and accessing it through haproxy. We've never had any issues there.
>>
>>  What's also strange is that I have another OpenStack deployment (from
>> Folsom to Icehouse) with just a single rabbitmq server installed directly
>> on the cloud controller (meaning: no nova-compute). I never have any rabbit
>> issues in that cloud.
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20150115/cef3b091/attachment.html>


More information about the OpenStack-operators mailing list