[openstack-dev] [Fuel][RabbitMQ] nova-compute stuck for a while (AMQP)

Roman Sokolkov rsokolkov at mirantis.com
Tue May 6 19:42:24 UTC 2014


Hello, fuelers.

I'm using Fuel 4.1A + Havana in HA mode.

I permanently observe (on other deployments also) issue with stuck
"nova-compute" service. But i think problem is more fundamental and relates
to HA RabbitMQ and OpenStack AMQP driver implementation.

Symptoms:

   - Random nova-compute from time to time marked as "XXX" for a while.
   - I see that service itself works properly. In logs i see that it sends
   status updates to conductor. But actually nothing is sent.
   - "netstat" shows that all connections to/from rabbit "ESTABLISHED"
   - rabbitmqctl shows that "compute.node-x" queue synced to all slaves.
   - nothing has been broken before, i mean rabbitmq cluster, etc.

Axe style solution:

   - /etc/init.d/openstack-nova-compute restart

So here i've found a lot of interesting stuff (and solutions):

https://bugs.launchpad.net/oslo.messaging/+bug/856764


My questions are:

   - Are there any thoughts particular for Fuel to solve/workaround this
   issue?
   - Any fast solution for this in 4.1? Like adjust TCP keep-alive
    timeouts?


-- 
Roman Sokolkov,
Deployment Engineer,
Mirantis, Inc.
Skype rsokolkov,
rsokolkov at mirantis.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140506/594356f7/attachment.html>


More information about the OpenStack-dev mailing list