Hi,<div><br></div><div>I'm running into an AMQP messaging issue, which caused 'run_instance' RPC never invoked at nova-compute side. It very rare to happen, and wish someone could shed me some light to follow on and debug into it. </div>
<div><br></div><div>SYMPTOMS</div><div>--</div><div>* 10.81.44.230 is the controller node, which runs RabbitMQ, MySQL and Nova-API</div><div>* 10.46.178.20 is the compute node, which runs nova-compute</div><div><div>* nova boot --image <imageid> --flavor <flavorid> test-server, and server running never receive the message</div>
</div><div><br></div><div>* Message (from scheduler) casted to this nova-compute host never got consumed ( 2 more message left)</div><div><div>* and '0' consumers listed from RabbitMQ perspective (should be '1' in consumers coloumn)</div>
</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">root@10.81.44.230:~# rabbitmqctl list_queues name messages_ready messages_unacknowledged consumers memory<br>
...<br>compute.10.46.178.20 2 0 0 34504<br>...</blockquote><div><div><br></div><div><div>* Connection to RabbitMQ server still in ESTABLISHED state</div><div></div></div><div>[<a href="mailto:root@10.46.178.20">root@10.46.178.20</a> log]# lsof -i | grep nova</div>
</div><div><div>nova-comp 4498 stack 13u IPv4 180448 0t0 TCP 10.46.178.20:42974->10.81.44.230:mysql (ESTABLISHED)</div><div>nova-comp 4498 stack 14u IPv4 21119 0t0 TCP 10.46.178.20:51564->10.81.44.230:amqp (ESTABLISHED)</div>
<div>nova-comp 4498 stack 15u IPv4 21721 0t0 TCP 10.46.178.20:51570->10.81.44.230:amqp (ESTABLISHED)</div><div><br></div><div>* RabbitMQ port check from compute node "nc -vz 10.81.44.230 5672" returns succeed</div>
<div>* Scheduler (10.81.44.230) can still receive compute servce update from compute node (10.46.178.20) via message queue</div><div><br></div><div>* Restart nova-compute can resolve the issue.</div><div><br></div><div>QUESTIONS</div>
<div>--</div><div>It is very rare to happen and hard to reproduce. Once it happens,</div><div>1. Which portion should I check or look into?</div><div>2. How can I check if _consumer_thread eventlet is still trying to consume the message? Afterall "rabbitmqctl list_queues consumers" prints 0 for this compute.host queue.</div>
<div>3. Is there any way to restore the message consumption without restarting nova-compute service?</div><div><br></div><div>Thanks!</div><div><br></div><div>Best Regards,</div><div>--</div></div><div><div>Qiu Yu<br><a href="http://www.unicell.info">http://www.unicell.info</a></div>
<br>
</div>