[openstack-dev] [nova] race condition with resize

Moshe Levi moshele at mellanox.com
Thu May 19 15:40:16 UTC 2016


Hi all,



While I was working on fixing the resize for pci passthrough [1] I have notice the following issue in resize.



If you are using small image and you resize-confirm it very fast the old resources are not getting freed.



After debug this issue I found out the root cause of it.



A Good run of resize is as detailed below:



When doing resize the _update_usage_from_migration in the resource trucker called twice.

1.       The first call we return  the instance type of the new flavor and will enter this case

                     https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L718

2.       Then it will put in the tracked_migrations the migration and the new instance_type

                    https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763

3.       The second call we return the old  instance_type and will enter this case

                     https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L725

4.       Then in the tracked_migrations it will overwrite  the old value with migration and the old instance type

5.       https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763

6.       When doing resize-confirm the drop_move_claim called with the old instance type

https://github.com/openstack/nova/blob/9a05d38f48ef0f630c5e49e332075b273cee38b9/nova/compute/manager.py#L3369

7.       The drop_move_claim will compare the instance_type[id] from the tracked_migrations to the instance_type.id (which is the old one)

8.       And because they are equals it will  remove the old resource usage

                    https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L328



But with small image like CirrOS   and doing the revert-confirm fast the second call of _update_usage_from_migration will not get executing.

The result is that when we enter the drop_move_claim it compares it with the new instance_type and this  expression is false https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L314

This mean that this code block is not executed  https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L326 and therefore old resources are not getting freed.



Any thought on the matter?





[1] - https://review.openstack.org/#/c/307124/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160519/912f1261/attachment.html>


More information about the OpenStack-dev mailing list