[openstack-dev] Question about locking
vishvananda at gmail.com
Mon Jul 1 17:11:47 UTC 2013
On Jul 1, 2013, at 2:27 AM, "Rosa, Andrea (HP Cloud Services)" <andrea.rosa at hp.com> wrote:
> Hi Ben,
> Thank you very much for your reply.
>> That function is using the synchronized decorator, which means that it's
>> wrapped by a semaphore context. As I understand it (and someone correct
>> me if I'm wrong), if an error happens and an exception is thrown the context
>> would be exited and the semaphore released. Of course, I suppose there are
>> situations where a thread could be terminated without being able to do that
>> cleanup, but I suspect most of those cases would kill the entire process,
>> making the lock irrelevant (since you specify not external).
> Ok, that is my understanding. Thanks for confirming it.
>>> If not I think that all other actions for that instance are blocked
>>> waiting for the lock, is that correct?
>> That is a potential pitfall of synchronization, but I think it shouldn't happen in
>> this case. Are you seeing this behavior?
> I am seeing an odd behaviour, sometimes (not often) I find instances in DELETED status (vm_state) which are not marked as deleted.
> Below what I found when I was debugging it:
> I found an instance in that odd status, looking at the log file for the compute node I didn't find any error, the service was running, the only thing I spotted was a gap of several minutes in the log file of the compute node. That is very unlikely.
> I tried to delete again the same instances but the operation never got completed. Maybe the thread which was trying to manage the first deletion died but the lock was still valid so all the other attempts to delete the same instance failed.
Were other commands working on the compute node? It seems much more likely that the node had a hung connection to rabbit. If you are not using tcp keepalives, a network hiccup (or failover) can cause half open connections where the server thinks the connection is still active so it sends the message but the compute node never receives it.
> To "fix" the issue I had to restart the nova-compute service (so all locks were released) and then I was able to complete the deletion.
> Does that make sense to you?
> PS: As you are on this topic I submitted a fix to complete the "pending" deletion when the compute service starts, it would be great if you can have a look at it: https://review.openstack.org/33265
> Andrea Rosa
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
More information about the OpenStack-dev