[openstack-dev] [nova] theoretical race between live migration and resource audit?

Chris Friesen chris.friesen at windriver.com
Thu Jun 9 21:41:20 UTC 2016


Hi,

I'm wondering if we might have a race between live migration and the resource 
audit.  I've included a few people on the receiver list that have worked 
directly with this code in the past.

In _update_available_resource() we have code that looks like this:

instances = objects.InstanceList.get_by_host_and_node()
self._update_usage_from_instances()
migrations = objects.MigrationList.get_in_progress_by_host_and_node()
self._update_usage_from_migrations()


In post_live_migration_at_destination() we do this (updating the host and node 
as well as the task state):
             instance.host = self.host
             instance.task_state = None
             instance.node = node_name
             instance.save(expected_task_state=task_states.MIGRATING)


And in _post_live_migration() we update the migration status to "completed":
         if migrate_data and migrate_data.get('migration'):
             migrate_data['migration'].status = 'completed'
             migrate_data['migration'].save()


Both of the latter routines are not serialized by the 
COMPUTE_RESOURCE_SEMAPHORE, so they can race relative to the code in 
_update_available_resource().


I'm wondering if we can have a situation like this:

1) migration in progress
2) We start running _update_available_resource() on destination, and we call 
instances = objects.InstanceList.get_by_host_and_node().  This will not return 
the migration, because it is not yet on the destination host.
3) The migration completes and we call post_live_migration_at_destination(), 
which sets the host/node/task_state on the instance.
4) In _update_available_resource() on destination, we call migrations = 
objects.MigrationList.get_in_progress_by_host_and_node().  This will return the 
migration for the instance in question, but when we run 
self._update_usage_from_migrations() the uuid will not be in "instances" and so 
we will use the instance from the newly-queried migration.  We will then ignore 
the instance because it is not in a "migrating" state.

Am I imagining things, or is there a race here?  If so, the negative effects 
would be that the resources of the migrating instance would be "lost", allowing 
a newly-scheduled instance to claim the same resources (PCI devices, pinned 
CPUs, etc.)

Chris



More information about the OpenStack-dev mailing list