<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Nov 13, 2014 at 6:29 PM, Murugan, Visnusaran <span dir="ltr"><<a href="mailto:visnusaran.murugan@hp.com" target="_blank">visnusaran.murugan@hp.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div link="#0563C1" vlink="#954F72" lang="EN-US">

<div>

<p class="MsoNormal">Hi all,<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">Convergence-POC distributes stack operations by sending resource actions over RPC for any heat-engine to execute. Entire stack lifecycle will be controlled by worker/observer notifications. This distributed model has its own advantages

 and disadvantages.<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">Any stack operation has a timeout and a single engine will be responsible for it. If that engine goes down, timeout is lost along with it. So a traditional way is for other engines to recreate timeout from scratch. Also a missed resource

 action notification will be detected only when stack operation timeout happens. <u></u>

<u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">To overcome this, we will need the following capability:<u></u><u></u></p>

<p><u></u><span>1.<span style="font:7.0pt "Times New Roman"">      

</span></span><u></u>Resource timeout (can be used for retry)</p></div></div></blockquote><div>We will shortly have a worker job, can't we have a job that just sleeps that gets started in parallel with the job that is doing the work?<br></div><div>It gets to the end of the sleep and runs a check. <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div link="#0563C1" vlink="#954F72" lang="EN-US"><div><p><u></u><u></u></p>

<p><u></u><span>2.<span style="font:7.0pt "Times New Roman"">      

</span></span><u></u>Recover from engine failure (loss of stack timeout, resource action notification)<u></u><u></u></p>

<p class="MsoNormal"><u></u> </p></div></div></blockquote><div><br></div><div>My suggestion above could catch failures as long as it was run in a different process.<br><br></div><div>-Angus<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div link="#0563C1" vlink="#954F72" lang="EN-US"><div><p class="MsoNormal"><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">Suggestion:<u></u><u></u></p>

<p><u></u><span>1.<span style="font:7.0pt "Times New Roman"">      

</span></span><u></u>Use task queue like celery to host timeouts for both stack and resource.<u></u><u></u></p>

<p><u></u><span>2.<span style="font:7.0pt "Times New Roman"">      

</span></span><u></u>Poll database for engine failures and restart timers/ retrigger resource retry (IMHO: This would be a traditional and weighs heavy)<u></u><u></u></p>

<p><u></u><span>3.<span style="font:7.0pt "Times New Roman"">      

</span></span><u></u>Migrate heat to use TaskFlow. (Too many code change)<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">I am not suggesting we use Task Flow. Using celery will have very minimum code change. (decorate appropriate functions)

<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">Your thoughts.<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">-Vishnu<u></u><u></u></p>

<p class="MsoNormal">IRC: ckmvishnu<u></u><u></u></p>

</div>

</div>


<br>_______________________________________________<br>

OpenStack-dev mailing list<br>

<a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

<br></blockquote></div><br></div></div>