[openstack-dev] [Heat] Using Job Queues for timeout ops

Murugan, Visnusaran visnusaran.murugan at hp.com
Thu Nov 13 08:29:49 UTC 2014

Hi all,

Convergence-POC distributes stack operations by sending resource actions over RPC for any heat-engine to execute. Entire stack lifecycle will be controlled by worker/observer notifications. This distributed model has its own advantages and disadvantages.

Any stack operation has a timeout and a single engine will be responsible for it. If that engine goes down, timeout is lost along with it. So a traditional way is for other engines to recreate timeout from scratch. Also a missed resource action notification will be detected only when stack operation timeout happens.

To overcome this, we will need the following capability:

1.       Resource timeout (can be used for retry)

2.       Recover from engine failure (loss of stack timeout, resource action notification)


1.       Use task queue like celery to host timeouts for both stack and resource.

2.       Poll database for engine failures and restart timers/ retrigger resource retry (IMHO: This would be a traditional and weighs heavy)

3.       Migrate heat to use TaskFlow. (Too many code change)

I am not suggesting we use Task Flow. Using celery will have very minimum code change. (decorate appropriate functions)

Your thoughts.

IRC: ckmvishnu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20141113/466da54d/attachment.html>

More information about the OpenStack-dev mailing list