[openstack-dev] [nova] Thoughs please on how to address a problem with mutliple deletes leading to a nova-compute thread pool problem
Day, Phil
philip.day at hp.com
Fri Oct 25 20:28:28 UTC 2013
> -----Original Message-----
> From: Clint Byrum [mailto:clint at fewbar.com]
> Sent: 25 October 2013 17:05
> To: openstack-dev
> Subject: Re: [openstack-dev] [nova] Thoughs please on how to address a
> problem with mutliple deletes leading to a nova-compute thread pool
> problem
>
> Excerpts from Day, Phil's message of 2013-10-25 03:46:01 -0700:
> > Hi Folks,
> >
> > We're very occasionally seeing problems where a thread processing a
> create hangs (and we've seen when taking to Cinder and Glance). Whilst
> those issues need to be hunted down in their own rights, they do show up
> what seems to me to be a weakness in the processing of delete requests
> that I'd like to get some feedback on.
> >
> > Delete is the one operation that is allowed regardless of the Instance state
> (since it's a one-way operation, and users should always be able to free up
> their quota). However when we get a create thread hung in one of these
> states, the delete requests when they hit the manager will also block as they
> are synchronized on the uuid. Because the user making the delete request
> doesn't see anything happen they tend to submit more delete requests.
> The Service is still up, so these go to the computer manager as well, and
> eventually all of the threads will be waiting for the lock, and the compute
> manager will stop consuming new messages.
> >
> > The problem isn't limited to deletes - although in most cases the change of
> state in the API means that you have to keep making different calls to get
> past the state checker logic to do it with an instance stuck in another state.
> Users also seem to be more impatient with deletes, as they are trying to free
> up quota for other things.
> >
> > So while I know that we should never get a thread into a hung state into
> the first place, I was wondering about one of the following approaches to
> address just the delete case:
> >
> > i) Change the delete call on the manager so it doesn't wait for the uuid lock.
> Deletes should be coded so that they work regardless of the state of the VM,
> and other actions should be able to cope with a delete being performed from
> under them. There is of course no guarantee that the delete itself won't
> block as well.
> >
>
> Almost anything unexpected that isn't "start the creation" results in just
> marking an instance as an ERROR right? So this approach is actually pretty
> straight forward to implement. You don't really have to make other
> operations any more intelligent than they already should be in cleaning up
> half-done operations when they encounter an error. It might be helpful to
> suppress or de-prioritize logging of these errors when it is obvious that this
> result was intended.
>
> > ii) Record in the API server that a delete has been started (maybe enough
> to use the task state being set to DELETEING in the API if we're sure this
> doesn't get cleared), and add a periodic task in the compute manager to
> check for and delete instances that are in a "DELETING" state for more than
> some timeout. Then the API, knowing that the delete will be processes
> eventually can just no-op any further delete requests.
> >
>
> s/API server/database/ right? I like the coalescing approach where you no
> longer take up more resources for repeated requests.
Yep, the state is saved in the DB, but its set by the API server - that's what I meant.
So it's not dependent on the manager getting the delete.
>
> I don't like the garbage collection aspect of this plan though.Garbage
> collection is a trade off of user experience for resources. If your GC thread
> gets too far behind your resources will be exhausted. If you make it too
> active, it wastes resources doing the actual GC. Add in that you have a
> timeout before things can be garbage collected and I think this becomes a
> very tricky thing to tune, and it may not be obvious it needs to be tuned until
> you have a user who does a lot of rapid create/delete cycles.
>
The GC is just a backstop here - you always let the first delete message through
so normally things work as they do now. Its only if the delete message doesn't get
processed for some reason that the GC would kick in. There are already
examples of this kind of clean-up in other periodic tasks.
> > iii) Add some hook into the ServiceGroup API so that the timer could
> depend on getting a free thread from the compute manager pool (ie run
> some no-op task) - so that of there are no free threads then the service
> becomes down. That would (eventually) stop the scheduler from sending
> new requests to it, and make deleted be processed in the API server but
> won't of course help with commands for other instances on the same host.
> >
>
> I'm not sure I understand this one.
>
At the moment the "liveness" of a service is determined by a separate thread
in the ServiceGroup class - all it really shows is that something in the manager
is still running. What I was thinking of is extending that so that it shows that
the manager is still capable of doing something useful. Doing some sort of
heartbeat message via the message bus would probably be even better, as
it shows that the manager is still capable of getting messages.
> > iv) Move away from having a general topic and thread pool for all requests,
> and start a listener on an instance specific topic for each running instance on
> a host (leaving the general topic and pool just for creates and other non-
> instance calls like the hypervisor API). Then a blocked task would only affect
> request for a specific instance.
> >
>
> A topic per record will get out of hand rapidly. If you think of the instance
> record in the DB as the topic though, then (i) and (iv) are actually quite
> similar.
>
Do you think ? Message queue systems can handle huge numbers of topics,
so I'm not sure that's a real issue. What all the uuid locking and state management
tries to do is to effectively serialize actions onto a specific instance - this would do
the same but use the message queue to well queue things, rather than doing it
in the thread pool, which is the problem here. It would also guarantee the order
of delivery of actions, and mean that you could for example send an action to
an instance that was being migrated.
Another way to get the same effect without having separate topics would be for
methods that are going to be synchonzied to test first to see if they would block,
and if so just put the message back on the queue. So the terminate call would be
something like:
@utils.synchronized(instance['uuid'])
def do_terminate_instance(instance, bdms):
try:
self._delete_instance(context, instance, bdms,
reservations=reservations)
except exception.InstanceTerminationFailure as error:
LOG.exception(_('Setting instance vm_state to ERROR'),
instance=instance)
self._set_instance_error_state(context, instance['uuid'])
except exception.InstanceNotFound as e:
LOG.warn(e, instance=instance)
If utils.would_block(instance['uuid']):
compute.rpcapi.terminate_instance(context, instance, bdms, reservations)
do_terminate_instance(instance, bdms)
That way rather than sit on a thread in the pool the requests would keep
circulating in the message queue. It's a bit of an odd way to use a queuing
system though.
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list