<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
</head>
<body dir="auto">
<div>I think I will try to have a unconference at the HK summit about ideas the cinder developers (and the taskflow developers, since it's not a concept that is unique /applicable to just cinder) are having about said state machine (and it's potential usage).</div>
<div><br>
</div>
<div>So look out for that, be interesting to have some nova folks involved there also :-)</div>
<div><br>
Sent from my really tiny device...</div>
<div><br>
On Oct 26, 2013, at 3:14 AM, "Alex Glikson" <<a href="mailto:GLIKSON@il.ibm.com">GLIKSON@il.ibm.com</a>> wrote:<br>
<br>
</div>
<blockquote type="cite">
<div><font size="2" face="sans-serif">+1</font> <br>
<br>
<font size="2" face="sans-serif">Regards,</font> <br>
<font size="2" face="sans-serif">Alex</font> <br>
<font size="2" face="sans-serif"><br>
</font><br>
<tt><font size="2">Joshua Harlow <<a href="mailto:harlowja@yahoo-inc.com">harlowja@yahoo-inc.com</a>> wrote on 26/10/2013 09:29:03 AM:<br>
> <br>
> An idea that others and I are having for a similar use case in <br>
> cinder (or it appears to be similar).<br>
> <br>
> If there was a well defined state machine/s in nova with well <br>
> defined and managed transitions between states then it seems like <br>
> this state machine could resume on failure as well as be interrupted<br>
> when a "dueling" or preemptable operation arrives (a delete while <br>
> being created for example). This way not only would it be very clear<br>
> the set of states and transitions but it would also be clear how <br>
> preemption occurs (and under what cases). <br>
> <br>
> Right now in nova there is a distributed and ad-hoc state machine <br>
> which if it was more formalized it could inherit some if the <br>
> described useful capabilities. It would also be much more resilient <br>
> to these types of locking problems that u described. <br>
> <br>
> IMHO that's the only way these types of problems will be fully be <br>
> fixed, not by more queues or more periodic tasks, but by solidifying<br>
> & formalizing the state machines that compose the work nova does.<br>
> <br>
> Sent from my really tiny device...<br>
> <br>
> > On Oct 25, 2013, at 3:52 AM, "Day, Phil" <<a href="mailto:philip.day@hp.com">philip.day@hp.com</a>> wrote:<br>
> > <br>
> > Hi Folks,<br>
> > <br>
> > We're very occasionally seeing problems where a thread processing <br>
> a create hangs (and we've seen when taking to Cinder and Glance). <br>
> Whilst those issues need to be hunted down in their own rights, they<br>
> do show up what seems to me to be a weakness in the processing of <br>
> delete requests that I'd like to get some feedback on.<br>
> > <br>
> > Delete is the one operation that is allowed regardless of the <br>
> Instance state (since it's a one-way operation, and users should <br>
> always be able to free up their quota). However when we get a <br>
> create thread hung in one of these states, the delete requests when <br>
> they hit the manager will also block as they are synchronized on the<br>
> uuid. Because the user making the delete request doesn't see <br>
> anything happen they tend to submit more delete requests. The <br>
> Service is still up, so these go to the computer manager as well, <br>
> and eventually all of the threads will be waiting for the lock, and <br>
> the compute manager will stop consuming new messages.<br>
> > <br>
> > The problem isn't limited to deletes - although in most cases the <br>
> change of state in the API means that you have to keep making <br>
> different calls to get past the state checker logic to do it with an<br>
> instance stuck in another state. Users also seem to be more <br>
> impatient with deletes, as they are trying to free up quota for other things. <br>
> > <br>
> > So while I know that we should never get a thread into a hung <br>
> state into the first place, I was wondering about one of the <br>
> following approaches to address just the delete case:<br>
> > <br>
> > i) Change the delete call on the manager so it doesn't wait for <br>
> the uuid lock. Deletes should be coded so that they work regardless<br>
> of the state of the VM, and other actions should be able to cope <br>
> with a delete being performed from under them. There is of course <br>
> no guarantee that the delete itself won't block as well. <br>
> > <br>
> > ii) Record in the API server that a delete has been started (maybe<br>
> enough to use the task state being set to DELETEING in the API if <br>
> we're sure this doesn't get cleared), and add a periodic task in the<br>
> compute manager to check for and delete instances that are in a <br>
> "DELETING" state for more than some timeout. Then the API, knowing <br>
> that the delete will be processes eventually can just no-op any <br>
> further delete requests.<br>
> > <br>
> > iii) Add some hook into the ServiceGroup API so that the timer <br>
> could depend on getting a free thread from the compute manager pool <br>
> (ie run some no-op task) - so that of there are no free threads then<br>
> the service becomes down. That would (eventually) stop the scheduler<br>
> from sending new requests to it, and make deleted be processed in <br>
> the API server but won't of course help with commands for other <br>
> instances on the same host.<br>
> > <br>
> > iv) Move away from having a general topic and thread pool for all <br>
> requests, and start a listener on an instance specific topic for <br>
> each running instance on a host (leaving the general topic and pool <br>
> just for creates and other non-instance calls like the hypervisor <br>
> API). Then a blocked task would only affect request for a specificinstance.<br>
> > <br>
> > I'm tending towards ii) as a simple and pragmatic solution in the <br>
> near term, although I like both iii) and iv) as being both generally<br>
> good enhancments - but iv) in particular feels like a pretty seismic change.<br>
> > <br>
> > Thoughts please,<br>
> > <br>
> > Phil <br>
> > <br>
> > _______________________________________________<br>
> > OpenStack-dev mailing list<br>
> > <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
> > </font></tt><a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"><tt><font size="2">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</font></tt></a><tt><font size="2"><br>
> <br>
> _______________________________________________<br>
> OpenStack-dev mailing list<br>
> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
> </font></tt><a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"><tt><font size="2">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</font></tt></a><tt><font size="2"><br>
> <br>
</font></tt></div>
</blockquote>
<blockquote type="cite">
<div><span>_______________________________________________</span><br>
<span>OpenStack-dev mailing list</span><br>
<span><a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a></span><br>
<span><a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a></span><br>
</div>
</blockquote>
</body>
</html>