[openstack-dev] [nova] How to fix the race condition issue between deleting and soft reboot?

Wangpan hzwangpan at corp.netease.com
Tue Nov 12 02:51:42 UTC 2013


Hi John,

I agree with you about that 'terminate should be able to happen at any time.'

And I have checked the terminate_instance and reboot_instance method in compute manager,
but they just log a warning while catching the InstanceNotFound exception, nothing else.

And last but not least, I believe that the InstanceNotFound exception should not be raised while this race condition occurs,
because the bug https://bugs.launchpad.net/nova/+bug/1246181
just resulting in an instance going to be a 'running deleted' one after termination.

I want to re-explain the race condition here:
1. soft reboot an instance
2. the 'soft reboot' thread waits for the instance becoming 'shutdown' state(this may be a long period if the instance doesn't install acpid service)
3. terminate the instance during step #2
4. the 'terminate' thread waits for the instance becoming 'shutdown' state through a endless loopingcall '_wait_for_destroy', too
4. if the 'soft reboot' thread finds the instance becomed to 'shutdown' state firstly, and re-create/restart it before the 'terminate' thread, then the instance will stick to 'deleting' status and couldn't be deleted again, because the 'terminate' thread lost itself in the endless loopingcall '_wait_for_destroy'(the instance will never become to 'shutdown' state) and the lock in 'terminate_instance', this is the bug https://bugs.launchpad.net/nova/+bug/1111213 which has been fixed.
5. and on the other hand, if the 'terminate' thread finds the instance becoming to 'shutdown' state firstly in the loopingcall '_wait_for_destroy', and the 'soft reboot' thread re-create/restart it just before the 'terminate' thread deletes the instance files in the instance dir(disk, disk.local, libvirt.xml, console.log and so on), then the 'terminate' thread finishes successfully, the instance is deleted in the nova db, but it is still running in the hypervisor, this is the bug I want to fix this time https://bugs.launchpad.net/nova/+bug/1246181, you can find my FIXME comment here: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L924


2013-11-12



Wangpan



发件人:John Garbutt <john at johngarbutt.com>
发送时间:2013-11-11 20:11
主题:Re: [openstack-dev] [nova] How to fix the race condition issue between deleting and soft reboot?
收件人:"OpenStack Development Mailing List (not for usage questions)"<openstack-dev at lists.openstack.org>
抄送:

It seems we still agreed that terminate should be able to happen at any time. 

I thought I remembered some code in the manager that treats 
InstanceNotFound errors differently. 

I would rather we ensure InstanceNotFound is raised to indicate we 
have hit this race condition, and let the compute manager unify how we 
deal with that across all sorts of operations. 

John 

On 11 November 2013 02:57, Wangpan <hzwangpan at corp.netease.com> wrote: 
> Hi all, 
> 
> I want to re-ask this problem after the Hongkong summit, you may have time 
> to discuss this issue now. 
> Thanks a lot! 
> 
> 2013-11-11 
> ________________________________ 
> Wangpan 
> ________________________________ 
> 发件人:"Wangpan"<hzwangpan at corp.netease.com> 
> 发送时间:2013-11-04 12:08 
> 主题:[openstack-dev] [nova] How to fix the race condition issue between 
> deleting and soft reboot? 
> 收件人:"OpenStack Development Mailing List (not for usage 
> questions)"<openstack-dev at lists.openstack.org> 
> 抄送: 
> 
> Hi all, 
> 
> I have a question about fixing a race condition issue between deleting and 
> soft reboot, 
> the issue is that: 
> 1. If we soft reboot an instance, and then delete it, the instance may not 
> be deleted and stand on deleting task state, this is because the bug below, 
> https://bugs.launchpad.net/nova/+bug/1111213 
> and I have fixed this bug yet several months ago(just for libvirt driver). 
> 2. The other issue is, if the instance is rebooted just before deleting the 
> files under instance dir, then it may become to a running deleted one, and 
> this bug is at below: 
> https://bugs.launchpad.net/nova/+bug/1246181 
> I want to fix it now, and I need your advice. 
> The commit is here: https://review.openstack.org/#/c/54477/ , you can post 
> your advice on gerrit or mail to me. 
> 
> The ways to fix bug #2 may be these(just for libvirt driver in my mind): 
> 1. Add a lock to reboot operation like the deleting operation, so the reboot 
> operation and the delete operation will be done in sequence. 
> But on the other hand, the soft reboot operation may cost 120s if the 
> instance doesn't support graceful shutdown, I think it is too long for a 
> user to delete an instance, so this may not be the best way. 
> 2. Check the instance state at the last of _cleanup method in libvirt 
> driver, and if it is still running, destroy it again. 
> This way is usable but both Nikola Dipanov and I don't like this 'ugly' way. 
> 3. Check the instance vm state and task state in nova db before booting in 
> reboot, if it is deleted/deleting, stop the reboot process, this will access 
> db at driver level, it is a 'ugly' way, too. 
> 
> Nikola suggests that 'maybe we can leverage task/vm states and refactor how 
> reboot is done so we can back out of a reboot on a delete', but I think we 
> should let user delete an instance at any time and any state, so the delete 
> operation during 'soft reboot' may not be forbidden. 
> 
> Thanks and waiting for your voice! 
> 
> 2013-11-04 
> ________________________________ 
> Wangpan 
> 
> _______________________________________________ 
> OpenStack-dev mailing list 
> OpenStack-dev at lists.openstack.org 
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
> 

_______________________________________________ 
OpenStack-dev mailing list 
OpenStack-dev at lists.openstack.org 
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131112/550c0789/attachment-0001.html>


More information about the OpenStack-dev mailing list