<div dir="ltr"><div>Hi,</div><div><br></div><div>I have confirmed that the issue related to locally queueing of resource</div><div>requests, which I highlighted at the design summit, exists currently. I</div><div>have also confirmed that the issue is solved in oslo.messaging version</div><div>5.0.0.</div><div><br></div><div>The issue is with oslo messaging library below version 5.0.0. The</div><div>messages, carrying RPC calls/casts requests, are drained from the</div><div>messaging server (RabbitMQ) and submitted in the thread pool executor</div><div>(GreenThreadPoolExecutor from futurist library). Before submitting the</div><div>message to the executor, the message is acknowledged, which means the</div><div>message is deleted from the messaging server.The thread pool executor</div><div>queues the messages locally when there are no eventlets available to</div><div>process  the message. This is bad, because the messages are queued up</div><div>locally and if the process goes down, these messages are lost, it is</div><div>very difficult to recover as they are not available in the messaging</div><div>server. The mail thread</div><div><a href="http://lists.openstack.org/pipermail/openstack-dev/2015-July/068742.html">http://lists.openstack.org/pipermail/openstack-dev/2015-July/068742.html</a></div><div>gives more context and I cried and wept when I read it.</div><div><br></div><div>In convergence, the heat engine casts the requests to process the</div><div>resources and we don't want the heat engine failures to result in loss</div><div>of those resource requests, as there is no easier way to recover them.</div><div><br></div><div>The issue is fixed by <a href="https://review.openstack.org/#/c/297988">https://review.openstack.org/#/c/297988</a> . I</div><div>installed and tested with version 5.0.0, which is the latest version of</div><div>oslo.messaging and has the fix.  In the new version, the messages are</div><div>acknowledged only after the message gets an eventlet. It is not ideal in</div><div>the sense that it doesn't give the service/client the freedom to</div><div>acknowledge when it wants to, but better than the older versions. So, if</div><div>the engine process cannot get an eventlet/thread to process the message,</div><div>it is not acknowledged and it remains in the messaging server.</div><div><br></div><div>I tested with two engine processes with executor thread pool size set to</div><div>2. This means at most only 4 resources should be processed at a time and</div><div>remaining should be available in the messaging server. I created a stack</div><div>of 8 test resources each with 20 secs of waiting time, and saw that 4</div><div>messages were available in the messaging server while other 4 were being</div><div>processed. I restarted the engine processes and the remaining messages</div><div>were again taken up of processing.</div><div><br></div><div>I am glad that the issue is fixed in the new version and we should move</div><div>to it before enabling convergence by default.</div><div><br></div><div>-- Anant</div></div>