[Openstack-operators] Way to check compute <-> rabbitmq connectivity

Belmiro Moreira moreira.belmiro.email.lists at gmail.com
Fri Jan 16 09:10:37 UTC 2015


Hi,
we had similar issues.
In our case, some times (not really a pattern here!) nova-compute didn't
consume messages even if everything was apparently happy.
We started monitoring the queues size and restarting nova-compute.

We are still using "python-oslo-messaging-1.3.0.2", however the problem
disappeared when we upgraded to "python-kombu-2.5.15" and
"rabbitmq-server-3.3.5"

Belmiro
---
CERN

On Fri, Jan 16, 2015 at 1:48 AM, Sam Morrison <sorrison at gmail.com> wrote:

> We’ve had a lot of issues with Icehouse related to rabbitMQ. Basically the
> change from openstack.rpc to oslo.messaging broke things. These things are
> now fixed in oslo.messaging version 1.5.1, there is still an issue with
> heartbeats and that patch is making it’s way through review process now.
>
> https://review.openstack.org/#/c/146047/
>
> Cheers,
> Sam
>
>
> On 16 Jan 2015, at 10:55 am, sridhar basam <sridhar.basam at gmail.com>
> wrote:
>
>
> If you are using ha queues, use a version of rabbitmq > 3.3.0. There was a
> change in that version where consumption on queues was automatically
> enabled when a master election for a queue happened. Previous versions only
> informed clients that they had to reconsume on a queue. It was the clients
> responsibility to start consumption on a queue.
>
> Make sure you enable tcp keepalives to a low enough value in case you have
> a firewall device in between your rabbit server and it's consumers.
>
> Monitor consumers on your rabbit infrastructure using 'rabbitmqctl
> list_queues name messages consumers'. Consumers on fanout queues is going
> to depend on the number of services of any type you have in your
> environment.
>
> Sri
>  On Jan 15, 2015 6:27 PM, "Michael Dorman" <mdorman at godaddy.com> wrote:
>
>>   Here is the bug I’ve been tracking related to this for a while.  I
>> haven’t really kept up to speed with it, so I don’t know the current status.
>>
>>  https://bugs.launchpad.net/nova/+bug/856764
>>
>>
>>   From: Kris Lindgren <klindgren at godaddy.com>
>> Date: Thursday, January 15, 2015 at 12:10 PM
>> To: Gustavo Randich <gustavo.randich at gmail.com>, OpenStack Operators <
>> openstack-operators at lists.openstack.org>
>> Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq
>> connectivity
>>
>>   During the Atlanta ops meeting this topic came up and I specifically
>> mentioned about adding a "no-op" or healthcheck ping to the rabbitmq stuff
>> to both nova & neutron.  The dev's in the room looked at me like I was
>> crazy, but it was so that we could exactly catch issues as you described.
>> I am also interested if any one knows of a lightweight call that could be
>> used to verify/confirm rabbitmq connectivity as well.  I haven't been able
>> to devote time to dig into it.  Mainly because if one client is having
>> issues - you will notice other clients are having similar/silent errors and
>> a restart of all the things is the easiest way to fix, for us atleast.
>>  ____________________________________________
>>
>> Kris Lindgren
>> Senior Linux Systems Engineer
>> GoDaddy, LLC.
>>
>>
>>   From: Gustavo Randich <gustavo.randich at gmail.com>
>> Date: Thursday, January 15, 2015 at 11:53 AM
>> To: "openstack-operators at lists.openstack.org" <
>> openstack-operators at lists.openstack.org>
>> Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq
>> connectivity
>>
>>    Just to add one more background scenario, we also had similar
>> problems trying to load balance rabbitmq via F5 Big IP LTM. For that reason
>> we don't use it now. Our installation is a single rabbitmq instance and no
>> intermediaries (albeit network switches). We use Folsom and Icehouse, the
>> problem being perceived more in Icehouse nodes.
>>
>>  We are already monitoring message queue size, but we would like to
>> pinpoint in semi-realtime the specific hosts/racks/network paths
>> experiencing the "stale connection" before a user complains about an
>> operation being stuck, or even hosts with no such pending operations but
>> already "disconnected" -- we also could diagnose possible network causes
>> and avoid massive service restarting.
>>
>>  So, for now, if someone knows about a cheap and quick openstack
>> operation that triggers a message interchange between rabbitmq and
>> nova-compute and a way of checking the result it would be great.
>>
>>
>>
>>
>> On Thu, Jan 15, 2015 at 1:45 PM, Kris G. Lindgren <klindgren at godaddy.com>
>> wrote:
>>
>>>  We did have an issue using celery  on an internal application that we
>>> wrote - but I believe it was fixed after much failover testing and code
>>> changes.  We also use logstash via rabbitmq and haven't noticed any issues
>>> there either.
>>>
>>>  So this seems to be just openstack/oslo related.
>>>
>>>  We have tried a number of different configurations - all of them had
>>> their issues.  We started out listing all the members in the cluster on the
>>> rabbit_hosts line.  This worked most of the time without issue, until we
>>> would restart one of the servers, then it seemed like the clients wouldn't
>>> figure out they were disconnected and reconnect to the next host.
>>>
>>>  In an attempt to solve that we moved to using harpoxy to present a vip
>>> that we configured in the rabbit_hosts line.  This created issues with long
>>> lived connections disconnects and a bunch of other issues.  In our
>>> production environment we moved to load balanced rabbitmq, but using a real
>>> loadbalancer, and don’t have the weird disconnect issues.  However, anytime
>>> we reboot/take down a rabbitmq host or pull a member from the cluster we
>>> have issues, or if their is a network disruption we also have issues.
>>>
>>>  Thinking the best course of action is to move rabbitmq off on to its
>>> own box and to leave it alone.
>>>
>>>  Does anyone have a rabbitmq setup that works well and doesn’t have
>>> random issues when pulling nodes for maintenance?
>>>   ____________________________________________
>>>
>>> Kris Lindgren
>>> Senior Linux Systems Engineer
>>> GoDaddy, LLC.
>>>
>>>
>>>   From: Joe Topjian <joe at topjian.net>
>>> Date: Thursday, January 15, 2015 at 9:29 AM
>>> To: "Kris G. Lindgren" <klindgren at godaddy.com>
>>> Cc: "openstack-operators at lists.openstack.org" <
>>> openstack-operators at lists.openstack.org>
>>> Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq
>>> connectivity
>>>
>>>   Hi Kris,
>>>
>>>     Our experience is pretty much the same on anything that is using
>>>> rabbitmq - not just nova-compute.
>>>>
>>>
>>>  Just to clarify: have you experienced this outside of OpenStack (or
>>> Oslo)?
>>>
>>>  We've seen similar issues with rabbitmq and OpenStack. We used to run
>>> rabbit through haproxy and tried a myriad of options like setting no
>>> timeouts, very very long timeouts, etc, but would always eventually see
>>> similar issues as described.
>>>
>>>  Last month, we reconfigured all OpenStack components to use the
>>> `rabbit_hosts` option with all nodes in our cluster listed. So far this has
>>> worked well, though I probably just jinxed myself. :)
>>>
>>>  We still have other services (like Sensu) using the same rabbitmq
>>> cluster and accessing it through haproxy. We've never had any issues there.
>>>
>>>  What's also strange is that I have another OpenStack deployment (from
>>> Folsom to Icehouse) with just a single rabbitmq server installed directly
>>> on the cloud controller (meaning: no nova-compute). I never have any rabbit
>>> issues in that cloud.
>>>
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>
>>>
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20150116/812ee848/attachment.html>


More information about the OpenStack-operators mailing list