[openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

Chris Friesen chris.friesen at windriver.com
Thu Jun 21 02:26:13 UTC 2018


On 06/20/2018 10:00 AM, Sylvain Bauza wrote:

> When we reviewed the spec, we agreed as a community to say that we should still
> get race conditions once the series is implemented, but at least it helps operators.
> Quoting :
> "It would also be possible for another instance to steal NUMA resources from a
> live migrated instance before the latter’s destination compute host has a chance
> to claim them. Until NUMA resource providers are implemented [3]
> <https://review.openstack.org/#/c/552924/> and allow for an essentially atomic
> schedule+claim operation, scheduling and claiming will keep being done at
> different times on different nodes. Thus, the potential for races will continue
> to exist."
> https://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/numa-aware-live-migration.html#proposed-change

My understanding of that quote was that we were acknowledging the fact that when 
using the ResourceTracker there was an unavoidable race window between the time 
when the scheduler selected a compute node and when the resources were claimed 
on that compute node in check_can_live_migrate_destination().  And in this model 
no resources are actually *used* until they are claimed.

As I understand it, Artom is proposing to have a larger race window, essentially 
from when the scheduler selects a node until the resource audit runs on that node.

Chris



More information about the OpenStack-dev mailing list