[openstack-dev] [Fuel-dev] [Fuel][RabbitMQ] nova-compute stuck for a while (AMQP)

Andrew Woodward xarses at gmail.com
Wed May 7 06:48:42 UTC 2014


Roman,

the current stable/4.1 has some fixes that make this less likely to
occur and is the most likely to recover.

That said, I've done some tracing and there are some issues with
nova-conductor processing those messages. Some of the times I've seen
the compute-node be the issue, other times I've seen nova-conductor be
the issue. As of stable/4.1 I've been able to track it down to
nova-conductor. AFAICT it receives the message from nova-compute,
takes it from the queue, acks the queue, and selects the object from
the DB. However after moving nova-compute and nova-conductor log trace
level in amqp and sqlalchemey, the issue appears to stop. I've yet to
confirm if the cluster state of rabbit changed, or if the change in
logging level changed it or something else.



On Tue, May 6, 2014 at 12:42 PM, Roman Sokolkov <rsokolkov at mirantis.com> wrote:
> Hello, fuelers.
>
> I'm using Fuel 4.1A + Havana in HA mode.
>
> I permanently observe (on other deployments also) issue with stuck
> "nova-compute" service. But i think problem is more fundamental and relates
> to HA RabbitMQ and OpenStack AMQP driver implementation.
>
> Symptoms:
>
> Random nova-compute from time to time marked as "XXX" for a while.
> I see that service itself works properly. In logs i see that it sends status
> updates to conductor. But actually nothing is sent.
> "netstat" shows that all connections to/from rabbit "ESTABLISHED"
> rabbitmqctl shows that "compute.node-x" queue synced to all slaves.
> nothing has been broken before, i mean rabbitmq cluster, etc.
>
> Axe style solution:
>
> /etc/init.d/openstack-nova-compute restart
>
> So here i've found a lot of interesting stuff (and solutions):
>
> https://bugs.launchpad.net/oslo.messaging/+bug/856764
>
>
> My questions are:
>
> Are there any thoughts particular for Fuel to solve/workaround this issue?
> Any fast solution for this in 4.1? Like adjust TCP keep-alive  timeouts?
>
>
> --
> Roman Sokolkov,
> Deployment Engineer,
> Mirantis, Inc.
> Skype rsokolkov,
> rsokolkov at mirantis.com
>
> --
> Mailing list: https://launchpad.net/~fuel-dev
> Post to     : fuel-dev at lists.launchpad.net
> Unsubscribe : https://launchpad.net/~fuel-dev
> More help   : https://help.launchpad.net/ListHelp
>



-- 
Andrew
Mirantis
Ceph community



More information about the OpenStack-dev mailing list