[openstack-dev] [heat] RPC messaging issue and convergence

Anant Patil anant.techie at gmail.com
Wed May 11 17:52:05 UTC 2016


Hi,

I have confirmed that the issue related to locally queueing of resource
requests, which I highlighted at the design summit, exists currently. I
have also confirmed that the issue is solved in oslo.messaging version
5.0.0.

The issue is with oslo messaging library below version 5.0.0. The
messages, carrying RPC calls/casts requests, are drained from the
messaging server (RabbitMQ) and submitted in the thread pool executor
(GreenThreadPoolExecutor from futurist library). Before submitting the
message to the executor, the message is acknowledged, which means the
message is deleted from the messaging server.The thread pool executor
queues the messages locally when there are no eventlets available to
process  the message. This is bad, because the messages are queued up
locally and if the process goes down, these messages are lost, it is
very difficult to recover as they are not available in the messaging
server. The mail thread
http://lists.openstack.org/pipermail/openstack-dev/2015-July/068742.html
gives more context and I cried and wept when I read it.

In convergence, the heat engine casts the requests to process the
resources and we don't want the heat engine failures to result in loss
of those resource requests, as there is no easier way to recover them.

The issue is fixed by https://review.openstack.org/#/c/297988 . I
installed and tested with version 5.0.0, which is the latest version of
oslo.messaging and has the fix.  In the new version, the messages are
acknowledged only after the message gets an eventlet. It is not ideal in
the sense that it doesn't give the service/client the freedom to
acknowledge when it wants to, but better than the older versions. So, if
the engine process cannot get an eventlet/thread to process the message,
it is not acknowledged and it remains in the messaging server.

I tested with two engine processes with executor thread pool size set to
2. This means at most only 4 resources should be processed at a time and
remaining should be available in the messaging server. I created a stack
of 8 test resources each with 20 secs of waiting time, and saw that 4
messages were available in the messaging server while other 4 were being
processed. I restarted the engine processes and the remaining messages
were again taken up of processing.

I am glad that the issue is fixed in the new version and we should move
to it before enabling convergence by default.

-- Anant
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160511/25e06e63/attachment.html>


More information about the OpenStack-dev mailing list