[openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

Chris Friesen chris.friesen at windriver.com
Tue Jun 19 21:21:04 UTC 2018


On 06/19/2018 01:59 PM, Artom Lifshitz wrote:
>> Adding
>> claims support later on wouldn't change any on-the-wire messaging, it would
>> just make things work more robustly.
>
> I'm not even sure about that. Assuming [1] has at least the right
> idea, it looks like it's an either-or kind of thing: either we use
> resource tracker claims and get the new instance NUMA topology that
> way, or do what was in the spec and have the dest send it to the
> source.

One way or another you need to calculate the new topology in 
ComputeManager.check_can_live_migrate_destination() and communicate that 
information back to the source so that it can be used in 
ComputeManager._do_live_migration().  The previous patches communicated the new 
topoology as part of instance.

> That being said, I still think I'm still in favor of choosing the
> "easy" way out. For instance, [2] should fail because we can't access
> the api db from the compute node.

I think you could use objects.ImageMeta.from_instance(instance) instead of 
request_spec.image.  The limits might be an issue.

> So unless there's a simpler way,
> using RT claims would involve changing the RPC to add parameters to
> check_can_live_migration_destination, which, while not necessarily
> bad, seems like useless complexity for a thing we know will get ripped
> out.

I agree that it makes sense to get the "simple" option working first.  If we 
later choose to make it work "properly" I don't think it would require undoing 
too much.

Something to maybe factor in to what you're doing--I think there is currently a 
bug when migrating an instance with no numa_topology to a host with a different 
set of host CPUs usable for floating instances--I think it will assume it can 
still float over the same host CPUs as before.  Once we have the ability to 
re-write the instance XML prior to the live-migration it would be good to fix 
this.  I think this would require passing the set of available CPUs on the 
destination back to the host for use when recalculating the XML for the guest. 
(See the "if not guest_cpu_numa_config" case in 
LibvirtDriver._get_guest_numa_config() where "allowed_cpus" is specified, and 
LibvirtDriver._get_guest_config() where guest.cpuset is written.)

Chris



More information about the OpenStack-dev mailing list