[openstack-dev] [Heat] Using Job Queues for timeout ops

Murugan, Visnusaran visnusaran.murugan at hp.com
Thu Nov 13 09:27:42 UTC 2014


Hi,

Intension is not to transfer work load of a failed engine onto an active one. Convergence implementation that we are working on will be able to recover from a failure, provided a timeout notification hits heat-engine. All I want is a safe holding area for my timeout tasks. Timeout can be a stack timeout or a resource timeout.

By code change :) I meant posting to a job queue will be a matter of decorating timeout method and firing it for a delayed execution. Felt that we need not use taskflow just for posting a delayed execution(timer in our case).

Correct me if I'm wrong.

-Vishnu

From: Joshua Harlow [mailto:harlowja at outlook.com]
Sent: Thursday, November 13, 2014 2:15 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Heat] Using Job Queues for timeout ops

A question;

How is using something like celery in heat vs taskflow in heat (or at least concept [1]) 'to many code change'.

Both seem like change of similar levels ;-)

What was your metric for determining the code change either would have (out of curiosity)?

Perhaps u should look at [2], although I'm unclear on what the desired functionality is here.

Do u want the single engine to transfer its work to another engine when it 'goes down'? If so then the jobboard model + zookeper inherently does this.

Or maybe u want something else? I'm probably confused because u seem to be asking for resource timeouts + recover from engine failure (which seems like a liveness issue and not a resource timeout one), those 2 things seem separable.

[1] http://docs.openstack.org/developer/taskflow/jobs.html

[2] http://docs.openstack.org/developer/taskflow/examples.html#jobboard-producer-consumer-simple

On Nov 13, 2014, at 12:29 AM, Murugan, Visnusaran <visnusaran.murugan at hp.com<mailto:visnusaran.murugan at hp.com>> wrote:


Hi all,

Convergence-POC distributes stack operations by sending resource actions over RPC for any heat-engine to execute. Entire stack lifecycle will be controlled by worker/observer notifications. This distributed model has its own advantages and disadvantages.

Any stack operation has a timeout and a single engine will be responsible for it. If that engine goes down, timeout is lost along with it. So a traditional way is for other engines to recreate timeout from scratch. Also a missed resource action notification will be detected only when stack operation timeout happens.

To overcome this, we will need the following capability:
1.       Resource timeout (can be used for retry)
2.       Recover from engine failure (loss of stack timeout, resource action notification)


Suggestion:
1.       Use task queue like celery to host timeouts for both stack and resource.
2.       Poll database for engine failures and restart timers/ retrigger resource retry (IMHO: This would be a traditional and weighs heavy)
3.       Migrate heat to use TaskFlow. (Too many code change)

I am not suggesting we use Task Flow. Using celery will have very minimum code change. (decorate appropriate functions)


Your thoughts.

-Vishnu
IRC: ckmvishnu
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org<mailto:OpenStack-dev at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20141113/d602415e/attachment.html>


More information about the OpenStack-dev mailing list