[Openstack-operators] Way to check compute <-> rabbitmq connectivity
Gustavo Randich
gustavo.randich at gmail.com
Mon Jan 19 22:55:11 UTC 2015
In the meantime, I'm using this horrendous script inside compute nodes to
check for rabbitmq connectivity. It uses the 'set_host_enabled' rpc call,
which in my case is innocuous.
#!/bin/bash
UUID=$(cat /proc/sys/kernel/random/uuid)
RABBIT=$(grep -Po '(?<=rabbit_host = ).+' /etc/nova/nova.conf)
HOSTX=$(hostname)
python -c "
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters(\"$RABBIT\"))
channel = connection.channel()
channel.basic_publish(exchange='nova', routing_key=\"compute.$HOSTX\",
properties=pika.BasicProperties(content_type = 'application/json'),
body = '{ \"version\": \"3.0\", \"_context_request_id\": \"$UUID\", \\
\"_context_roles\": [\"KeystoneAdmin\", \"KeystoneServiceAdmin\",
\"admin\"], \\
\"_context_user_id\": \"XXX\", \\
\"_context_project_id\": \"XXX\", \\
\"method\": \"set_host_enabled\", \\
\"args\": {\"enabled\": true} \\
}'
)
connection.close()"
sleep 2
tail -1000 /var/log/nova/nova-compute.log | grep -q $UUID || { echo
"WARNING: nova-compute not consuming RabbitMQ messages. Last message:
$UUID"; exit 1; }
echo "OK"
On Thu, Jan 15, 2015 at 9:48 PM, Sam Morrison <sorrison at gmail.com> wrote:
> We've had a lot of issues with Icehouse related to rabbitMQ. Basically the
> change from openstack.rpc to oslo.messaging broke things. These things are
> now fixed in oslo.messaging version 1.5.1, there is still an issue with
> heartbeats and that patch is making it's way through review process now.
>
> https://review.openstack.org/#/c/146047/
>
> Cheers,
> Sam
>
>
> On 16 Jan 2015, at 10:55 am, sridhar basam <sridhar.basam at gmail.com>
> wrote:
>
>
> If you are using ha queues, use a version of rabbitmq > 3.3.0. There was a
> change in that version where consumption on queues was automatically
> enabled when a master election for a queue happened. Previous versions only
> informed clients that they had to reconsume on a queue. It was the clients
> responsibility to start consumption on a queue.
>
> Make sure you enable tcp keepalives to a low enough value in case you have
> a firewall device in between your rabbit server and it's consumers.
>
> Monitor consumers on your rabbit infrastructure using 'rabbitmqctl
> list_queues name messages consumers'. Consumers on fanout queues is going
> to depend on the number of services of any type you have in your
> environment.
>
> Sri
> On Jan 15, 2015 6:27 PM, "Michael Dorman" <mdorman at godaddy.com> wrote:
>
>> Here is the bug I've been tracking related to this for a while. I
>> haven't really kept up to speed with it, so I don't know the current status.
>>
>> https://bugs.launchpad.net/nova/+bug/856764
>>
>>
>> From: Kris Lindgren <klindgren at godaddy.com>
>> Date: Thursday, January 15, 2015 at 12:10 PM
>> To: Gustavo Randich <gustavo.randich at gmail.com>, OpenStack Operators <
>> openstack-operators at lists.openstack.org>
>> Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq
>> connectivity
>>
>> During the Atlanta ops meeting this topic came up and I specifically
>> mentioned about adding a "no-op" or healthcheck ping to the rabbitmq stuff
>> to both nova & neutron. The dev's in the room looked at me like I was
>> crazy, but it was so that we could exactly catch issues as you described.
>> I am also interested if any one knows of a lightweight call that could be
>> used to verify/confirm rabbitmq connectivity as well. I haven't been able
>> to devote time to dig into it. Mainly because if one client is having
>> issues - you will notice other clients are having similar/silent errors and
>> a restart of all the things is the easiest way to fix, for us atleast.
>> ____________________________________________
>>
>> Kris Lindgren
>> Senior Linux Systems Engineer
>> GoDaddy, LLC.
>>
>>
>> From: Gustavo Randich <gustavo.randich at gmail.com>
>> Date: Thursday, January 15, 2015 at 11:53 AM
>> To: "openstack-operators at lists.openstack.org" <
>> openstack-operators at lists.openstack.org>
>> Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq
>> connectivity
>>
>> Just to add one more background scenario, we also had similar
>> problems trying to load balance rabbitmq via F5 Big IP LTM. For that reason
>> we don't use it now. Our installation is a single rabbitmq instance and no
>> intermediaries (albeit network switches). We use Folsom and Icehouse, the
>> problem being perceived more in Icehouse nodes.
>>
>> We are already monitoring message queue size, but we would like to
>> pinpoint in semi-realtime the specific hosts/racks/network paths
>> experiencing the "stale connection" before a user complains about an
>> operation being stuck, or even hosts with no such pending operations but
>> already "disconnected" -- we also could diagnose possible network causes
>> and avoid massive service restarting.
>>
>> So, for now, if someone knows about a cheap and quick openstack
>> operation that triggers a message interchange between rabbitmq and
>> nova-compute and a way of checking the result it would be great.
>>
>>
>>
>>
>> On Thu, Jan 15, 2015 at 1:45 PM, Kris G. Lindgren <klindgren at godaddy.com>
>> wrote:
>>
>>> We did have an issue using celery on an internal application that we
>>> wrote - but I believe it was fixed after much failover testing and code
>>> changes. We also use logstash via rabbitmq and haven't noticed any issues
>>> there either.
>>>
>>> So this seems to be just openstack/oslo related.
>>>
>>> We have tried a number of different configurations - all of them had
>>> their issues. We started out listing all the members in the cluster on the
>>> rabbit_hosts line. This worked most of the time without issue, until we
>>> would restart one of the servers, then it seemed like the clients wouldn't
>>> figure out they were disconnected and reconnect to the next host.
>>>
>>> In an attempt to solve that we moved to using harpoxy to present a vip
>>> that we configured in the rabbit_hosts line. This created issues with long
>>> lived connections disconnects and a bunch of other issues. In our
>>> production environment we moved to load balanced rabbitmq, but using a real
>>> loadbalancer, and don't have the weird disconnect issues. However, anytime
>>> we reboot/take down a rabbitmq host or pull a member from the cluster we
>>> have issues, or if their is a network disruption we also have issues.
>>>
>>> Thinking the best course of action is to move rabbitmq off on to its
>>> own box and to leave it alone.
>>>
>>> Does anyone have a rabbitmq setup that works well and doesn't have
>>> random issues when pulling nodes for maintenance?
>>> ____________________________________________
>>>
>>> Kris Lindgren
>>> Senior Linux Systems Engineer
>>> GoDaddy, LLC.
>>>
>>>
>>> From: Joe Topjian <joe at topjian.net>
>>> Date: Thursday, January 15, 2015 at 9:29 AM
>>> To: "Kris G. Lindgren" <klindgren at godaddy.com>
>>> Cc: "openstack-operators at lists.openstack.org" <
>>> openstack-operators at lists.openstack.org>
>>> Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq
>>> connectivity
>>>
>>> Hi Kris,
>>>
>>> Our experience is pretty much the same on anything that is using
>>>> rabbitmq - not just nova-compute.
>>>>
>>>
>>> Just to clarify: have you experienced this outside of OpenStack (or
>>> Oslo)?
>>>
>>> We've seen similar issues with rabbitmq and OpenStack. We used to run
>>> rabbit through haproxy and tried a myriad of options like setting no
>>> timeouts, very very long timeouts, etc, but would always eventually see
>>> similar issues as described.
>>>
>>> Last month, we reconfigured all OpenStack components to use the
>>> `rabbit_hosts` option with all nodes in our cluster listed. So far this has
>>> worked well, though I probably just jinxed myself. :)
>>>
>>> We still have other services (like Sensu) using the same rabbitmq
>>> cluster and accessing it through haproxy. We've never had any issues there.
>>>
>>> What's also strange is that I have another OpenStack deployment (from
>>> Folsom to Icehouse) with just a single rabbitmq server installed directly
>>> on the cloud controller (meaning: no nova-compute). I never have any rabbit
>>> issues in that cloud.
>>>
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>
>>>
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20150119/94a42d71/attachment.html>
More information about the OpenStack-operators
mailing list