[Openstack-operators] RabbitMQ in cluster mode - high cpu usage

Antonio Messina antonio.s.messina at gmail.com
Thu Jul 16 11:21:32 UTC 2015


Hi all,

We are deploying Kilo on Ubuntu Trusty. We run all services on 5
controller nodes,  including RabbitMQ in cluster with HA queues. We
configure "rabbit_hosts" on all services to point to the 5 rabbitmq
nodes.

On each controller node the beam.smp process is taking ~150-250% of
CPU and around 2GB of resident memory, even when no VM is running.
Also note that this is all CPU time, no waiting time due to intensive
IO. We can't figure out why.

One of the (probably unrelated) things we found is that although the
"heartbeat" option for rabbitmq in nova is marked as EXPERIMENTAL,
it's enabled by default. Indeed, we found on the logs many errors
like:

<11>Jul 15 20:19:27 node-k5-01-10 2015-07-15 20:19:27.625 128786 ERROR
oslo_messaging._drivers.impl_rabbit [-] AMQP server on
cloud-l2-41.os.s3it.uzh.ch:5672 is unreachable: Too many heartbeats
missed. Trying again in 1 seconds.

Note that the rabbitmq servers were all up&running. On the rabbitmq
server, the error was something like:

=ERROR REPORT==== 15-Jul-2015::13:13:29 ===
closing AMQP connection <0.18550.0> (10.129.16.173:55330 -> 10.129.31.229:5672):
{heartbeat_timeout,running}

We disabled heartbeat for nova, in section [oslo_messaging_rabbit]. We
don't see these errors on the compute node anymore, but the CPU usage
for RabbitMQ is still high, so it's probably unrelated.

I wonder if anyone can answer to our questions:

* is anyone is experiencing the same behavior? Do you have a solution?
* why is heartbeat option in nova enabled, and if can be safely disabled?
* is anyone experiencing similar issues with qpid? (we are not
especially attached to any amqp implementation)
* are the default values for timeout/backoff/retry in nova.conf sane,
even in a not-so-small installation? (64 compute nodes right now for
"testing", 128 soon)

Thank you in advance for your help,
    Antonio Messina

Package versions:

rabbitmq-server                      3.4.3-2~cloud0
python-amqp                   1.4.6-0ubuntu1~cloud0
python-amqplib                1.0.2-1
python-kombu                  3.0.24-0ubuntu2~cloud0

-- 
antonio.s.messina at gmail.com
antonio.messina at uzh.ch                     +41 (0)44 635 42 22
S3IT: Service and Support for Science IT   http://www.s3it.uzh.ch/
University of Zurich
Winterthurerstrasse 190
CH-8057 Zurich Switzerland



More information about the OpenStack-operators mailing list