[openstack-dev] How to debug OpenStack RabbitMQ message not consumed issues?

Jian Wen wenjian at canonical.com
Wed Dec 19 12:41:40 UTC 2012


On 2012?12?19? 20:05, unicell wrote:
> Hi,
>
> I'm running into an AMQP messaging issue, which caused 'run_instance'
> RPC never invoked at nova-compute side. It very rare to happen, and
> wish someone could shed me some light to follow on and debug into it. 
>
> SYMPTOMS
> --
> * 10.81.44.230 is the controller node, which runs RabbitMQ, MySQL and
> Nova-API
> * 10.46.178.20 is the compute node, which runs nova-compute
> * nova boot --image <imageid> --flavor <flavorid> test-server, and
> server running never receive the message
>
> * Message (from scheduler) casted to this nova-compute host never got
> consumed ( 2 more message left)
> * and '0' consumers listed from RabbitMQ perspective (should be '1' in
> consumers coloumn)
>
>     root at 10.81.44.230:~# rabbitmqctl list_queues name messages_ready
>     messages_unacknowledged consumers memory
>     ...
>     compute.10.46.178.20  2       0       0       34504
>     ...
>
>
> * Connection to RabbitMQ server still in ESTABLISHED state
> [root at 10.46.178.20 <mailto:root at 10.46.178.20> log]# lsof -i | grep nova
> nova-comp  4498   stack   13u  IPv4 180448      0t0  TCP
> 10.46.178.20:42974->10.81.44.230:mysql (ESTABLISHED)
> nova-comp  4498   stack   14u  IPv4  21119      0t0  TCP
> 10.46.178.20:51564->10.81.44.230:amqp (ESTABLISHED)
> nova-comp  4498   stack   15u  IPv4  21721      0t0  TCP
> 10.46.178.20:51570->10.81.44.230:amqp (ESTABLISHED)
Could you also paste the result of "netstat -ant | grep -E 'Recv-Q|5672'" ?
Maybe its Recv-Q is full. The user space program which is nova-compute
here can't consume TCP buffer any more.
While its Recv-Q is full, the TCP connection is  ESTABLISHED but no data
can be transferred.
>
> * RabbitMQ port check from compute node "nc -vz 10.81.44.230 5672"
> returns succeed
> * Scheduler (10.81.44.230) can still receive compute servce update
> from compute node (10.46.178.20) via message queue
>
> * Restart nova-compute can resolve the issue.
>
> QUESTIONS
> --
> It is very rare to happen and hard to reproduce. Once it happens,
> 1. Which portion should I check or look into?
I think it's nova-compute, we need to debug it while it is running.
> 2. How can I check if _consumer_thread eventlet is still trying to
> consume the message? Afterall "rabbitmqctl list_queues consumers"
> prints 0 for this compute.host queue.
tcpdump?
> 3. Is there any way to restore the message consumption without
> restarting nova-compute service?
Not sure until we know how to reproduce the bug.
>
> Thanks!
>
> Best Regards,
> --
> Qiu Yu
> http://www.unicell.info
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


-- 
Jian Wen
Software Engineer, Services and Support Team
Canonical, Ltd

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20121219/145dd240/attachment.html>


More information about the OpenStack-dev mailing list