[openstack-dev] [Heat] Using Job Queues for timeout ops
    Zane Bitter 
    zbitter at redhat.com
       
    Thu Nov 13 13:54:03 UTC 2014
    
    
  
On 13/11/14 03:29, Murugan, Visnusaran wrote:
> Hi all,
>
> Convergence-POC distributes stack operations by sending resource actions
> over RPC for any heat-engine to execute. Entire stack lifecycle will be
> controlled by worker/observer notifications. This distributed model has
> its own advantages and disadvantages.
>
> Any stack operation has a timeout and a single engine will be
> responsible for it. If that engine goes down, timeout is lost along with
> it. So a traditional way is for other engines to recreate timeout from
> scratch. Also a missed resource action notification will be detected
> only when stack operation timeout happens.
>
> To overcome this, we will need the following capability:
>
> 1.Resource timeout (can be used for retry)
I don't believe this is strictly needed for phase 1 (essentially we 
don't have it now, so nothing gets worse).
For phase 2, yes, we'll want it. One thing we haven't discussed much is 
that if we used Zaqar for this then the observer could claim a message 
but not acknowledge it until it had processed it, so we could have 
guaranteed delivery.
> 2.Recover from engine failure (loss of stack timeout, resource action
> notification)
>
> Suggestion:
>
> 1.Use task queue like celery to host timeouts for both stack and resource.
I believe Celery is more or less a non-starter as an OpenStack 
dependency because it uses Kombu directly to talk to the queue, vs. 
oslo.messaging which is an abstraction layer over Kombu, Qpid, ZeroMQ 
and maybe others in the future. i.e. requiring Celery means that some 
users would be forced to install Rabbit for the first time.
One option would be to fork Celery and replace Kombu with oslo.messaging 
as its abstraction layer. Good luck getting that maintained though, 
since Celery _invented_ Kombu to be it's abstraction layer.
> 2.Poll database for engine failures and restart timers/ retrigger
> resource retry (IMHO: This would be a traditional and weighs heavy)
>
> 3.Migrate heat to use TaskFlow. (Too many code change)
If it's just handling timed triggers (maybe this is closer to #2) and 
not migrating the whole code base, then I don't see why it would be a 
big change (or even a change at all - it's basically new functionality). 
I'm not sure if TaskFlow has something like this already. If not we 
could also look at what Mistral is doing with timed tasks and see if we 
could spin some of it out into an Oslo library.
cheers,
Zane.
> I am not suggesting we use Task Flow. Using celery will have very
> minimum code change. (decorate appropriate functions)
>
> Your thoughts.
>
> -Vishnu
>
> IRC: ckmvishnu
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
    
    
More information about the OpenStack-dev
mailing list