Open Stack

Thu Jun 21 16:53:28 UTC 2018

On 06/21/2018 07:04 AM, Artom Lifshitz wrote:
>     As I understand it, Artom is proposing to have a larger race window,
>     essentially
>     from when the scheduler selects a node until the resource audit runs on that
>     node.
>
>
> Exactly. When writing the spec I thought we could just call the resource tracker
> to claim the resources when the migration was done. However, when I started
> looking at the code in reaction to Sahid's feedback, I noticed that there's no
> way to do it without the MoveClaim context (right?)

In the previous patch, the MoveClaim is the thing that calculates the dest NUMA 
topology given the flavor/image, then calls hardware.numa_fit_instance_to_host() 
to figure out what specific host resources to consume.  That claim is then 
associated with the migration object and the instance.migration_context, and 
then we call _update_usage_from_migration() to actually consume the resources on 
the destination.  This all happens within check_can_live_migrate_destination().

As an improvement over what you've got, I think you could just kick off an early 
call of update_available_resource() once the migration is done.  It'd be 
potentially computationally expensive, but it'd reduce the race window.

> Keep in mind, we're not making any race windows worse - I'm proposing keeping
> the status quo and fixing it later with NUMA in placement (or the resource
> tracker if we can swing it).

Well, right now live migration is totally broken so nobody's doing it.  You're 
going to make it kind of work but with racy resource tracking, which could lead 
to people doing it then getting in trouble.  We'll want to make sure there's a 
suitable release note for this.

> The resource tracker stuff is just so... opaque. For instance, the original
> patch [1] uses a mutated_migration_context around the pre_live_migration call to
> the libvirt driver. Would I still need to do that? Why or why not?

The mutated context applies the "new" numa_topology and PCI stuff.

The reason for the mutated context for pre_live_migration() is so that the 
plug_vifs(instance) call will make use of the new macvtap device information. 
See Moshe's comment from Dec 8 2016 at https://review.openstack.org/#/c/244489/46.

I think the mutated context around the call to self.driver.live_migration() is 
so that the new XML represents the newly-claimed pinned CPUs on the destination.

> At this point we need to commit to something and roll with it, so I'm sticking
> to the "easy way". If it gets shut down in code review, at least we'll have
> certainty on how to approach this next cycle.

Yep, I'm cool with incremental improvement.

Chris

Open Stack

[openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

OpenStack

Community

Documentation

Branding & Legal