[openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard
Chris Friesen
chris.friesen at windriver.com
Thu Jun 21 02:26:13 UTC 2018
On 06/20/2018 10:00 AM, Sylvain Bauza wrote:
> When we reviewed the spec, we agreed as a community to say that we should still
> get race conditions once the series is implemented, but at least it helps operators.
> Quoting :
> "It would also be possible for another instance to steal NUMA resources from a
> live migrated instance before the latter’s destination compute host has a chance
> to claim them. Until NUMA resource providers are implemented [3]
> <https://review.openstack.org/#/c/552924/> and allow for an essentially atomic
> schedule+claim operation, scheduling and claiming will keep being done at
> different times on different nodes. Thus, the potential for races will continue
> to exist."
> https://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/numa-aware-live-migration.html#proposed-change
My understanding of that quote was that we were acknowledging the fact that when
using the ResourceTracker there was an unavoidable race window between the time
when the scheduler selected a compute node and when the resources were claimed
on that compute node in check_can_live_migrate_destination(). And in this model
no resources are actually *used* until they are claimed.
As I understand it, Artom is proposing to have a larger race window, essentially
from when the scheduler selects a node until the resource audit runs on that node.
Chris
More information about the OpenStack-dev
mailing list