[openstack-dev] [nova] Thoughs please on how to address a problem with mutliple deletes leading to a nova-compute thread pool problem

Abhishek Lahiri aviostack at gmail.com
Sat Oct 26 16:09:43 UTC 2013


Deletes should only be allowed when the vm is in a power off state. This will allow consistent state transition.

Thanks
Al


On Oct 26, 2013, at 8:55 AM, Joshua Harlow <harlowja at yahoo-inc.com> wrote:

> I think I will try to have a unconference at the HK summit about ideas the cinder developers (and the taskflow developers, since it's not a concept that is unique /applicable to just cinder) are having about said state machine (and it's potential usage).
> 
> So look out for that, be interesting to have some nova folks involved there also :-)
> 
> Sent from my really tiny device...
> 
> On Oct 26, 2013, at 3:14 AM, "Alex Glikson" <GLIKSON at il.ibm.com> wrote:
> 
>> +1 
>> 
>> Regards, 
>> Alex 
>> 
>> 
>> Joshua Harlow <harlowja at yahoo-inc.com> wrote on 26/10/2013 09:29:03 AM:
>> > 
>> > An idea that others and I are having for a similar use case in 
>> > cinder (or it appears to be similar).
>> > 
>> > If there was a well defined state machine/s in nova with well 
>> > defined and managed transitions between states then it seems like 
>> > this state machine could resume on failure as well as be interrupted
>> > when a "dueling" or preemptable operation arrives (a delete while 
>> > being created for example). This way not only would it be very clear
>> > the set of states and transitions but it would also be clear how 
>> > preemption occurs (and under what cases). 
>> > 
>> > Right now in nova there is a distributed and ad-hoc state machine 
>> > which if it was more formalized it could inherit some if the 
>> > described useful capabilities. It would also be much more resilient 
>> > to these types of locking problems that u described. 
>> > 
>> > IMHO that's the only way these types of problems will be fully be 
>> > fixed, not by more queues or more periodic tasks, but by solidifying
>> > & formalizing the state machines that compose the work nova does.
>> > 
>> > Sent from my really tiny device...
>> > 
>> > > On Oct 25, 2013, at 3:52 AM, "Day, Phil" <philip.day at hp.com> wrote:
>> > > 
>> > > Hi Folks,
>> > > 
>> > > We're very occasionally seeing problems where a thread processing 
>> > a create hangs (and we've seen when taking to Cinder and Glance).  
>> > Whilst those issues need to be hunted down in their own rights, they
>> > do show up what seems to me to be a weakness in the processing of 
>> > delete requests that I'd like to get some feedback on.
>> > > 
>> > > Delete is the one operation that is allowed regardless of the 
>> > Instance state (since it's a one-way operation, and users should 
>> > always be able to free up their quota).   However when we get a 
>> > create thread hung in one of these states, the delete requests when 
>> > they hit the manager will also block as they are synchronized on the
>> > uuid.   Because the user making the delete request doesn't see 
>> > anything happen they tend to submit more delete requests.   The 
>> > Service is still up, so these go to the computer manager as well, 
>> > and eventually all of the threads will be waiting for the lock, and 
>> > the compute manager will stop consuming new messages.
>> > > 
>> > > The problem isn't limited to deletes - although in most cases the 
>> > change of state in the API means that you have to keep making 
>> > different calls to get past the state checker logic to do it with an
>> > instance stuck in another state.   Users also seem to be more 
>> > impatient with deletes, as they are trying to free up quota for other things. 
>> > > 
>> > > So while I know that we should never get a thread into a hung 
>> > state into the first place, I was wondering about one of the 
>> > following approaches to address just the delete case:
>> > > 
>> > > i) Change the delete call on the manager so it doesn't wait for 
>> > the uuid lock.  Deletes should be coded so that they work regardless
>> > of the state of the VM, and other actions should be able to cope 
>> > with a delete being performed from under them.  There is of course 
>> > no guarantee that the delete itself won't block as well. 
>> > > 
>> > > ii) Record in the API server that a delete has been started (maybe
>> > enough to use the task state being set to DELETEING in the API if 
>> > we're sure this doesn't get cleared), and add a periodic task in the
>> > compute manager to check for and delete instances that are in a 
>> > "DELETING" state for more than some timeout. Then the API, knowing 
>> > that the delete will be processes eventually can just no-op any 
>> > further delete requests.
>> > > 
>> > > iii) Add some hook into the ServiceGroup API so that the timer 
>> > could depend on getting a free thread from the compute manager pool 
>> > (ie run some no-op task) - so that of there are no free threads then
>> > the service becomes down. That would (eventually) stop the scheduler
>> > from sending new requests to it, and make deleted be processed in 
>> > the API server but won't of course help with commands for other 
>> > instances on the same host.
>> > > 
>> > > iv) Move away from having a general topic and thread pool for all 
>> > requests, and start a listener on an instance specific topic for 
>> > each running instance on a host (leaving the general topic and pool 
>> > just for creates and other non-instance calls like the hypervisor 
>> > API).   Then a blocked task would only affect request for a specificinstance.
>> > > 
>> > > I'm tending towards ii) as a simple and pragmatic solution in the 
>> > near term, although I like both iii) and iv) as being both generally
>> > good enhancments - but iv) in particular feels like a pretty seismic change.
>> > > 
>> > > Thoughts please,
>> > > 
>> > > Phil        
>> > > 
>> > > _______________________________________________
>> > > OpenStack-dev mailing list
>> > > OpenStack-dev at lists.openstack.org
>> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> > 
>> > _______________________________________________
>> > OpenStack-dev mailing list
>> > OpenStack-dev at lists.openstack.org
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131026/91127fb3/attachment.html>


More information about the OpenStack-dev mailing list