[openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard
Chris Friesen
chris.friesen at windriver.com
Tue Jun 19 21:21:04 UTC 2018
On 06/19/2018 01:59 PM, Artom Lifshitz wrote:
>> Adding
>> claims support later on wouldn't change any on-the-wire messaging, it would
>> just make things work more robustly.
>
> I'm not even sure about that. Assuming [1] has at least the right
> idea, it looks like it's an either-or kind of thing: either we use
> resource tracker claims and get the new instance NUMA topology that
> way, or do what was in the spec and have the dest send it to the
> source.
One way or another you need to calculate the new topology in
ComputeManager.check_can_live_migrate_destination() and communicate that
information back to the source so that it can be used in
ComputeManager._do_live_migration(). The previous patches communicated the new
topoology as part of instance.
> That being said, I still think I'm still in favor of choosing the
> "easy" way out. For instance, [2] should fail because we can't access
> the api db from the compute node.
I think you could use objects.ImageMeta.from_instance(instance) instead of
request_spec.image. The limits might be an issue.
> So unless there's a simpler way,
> using RT claims would involve changing the RPC to add parameters to
> check_can_live_migration_destination, which, while not necessarily
> bad, seems like useless complexity for a thing we know will get ripped
> out.
I agree that it makes sense to get the "simple" option working first. If we
later choose to make it work "properly" I don't think it would require undoing
too much.
Something to maybe factor in to what you're doing--I think there is currently a
bug when migrating an instance with no numa_topology to a host with a different
set of host CPUs usable for floating instances--I think it will assume it can
still float over the same host CPUs as before. Once we have the ability to
re-write the instance XML prior to the live-migration it would be good to fix
this. I think this would require passing the set of available CPUs on the
destination back to the host for use when recalculating the XML for the guest.
(See the "if not guest_cpu_numa_config" case in
LibvirtDriver._get_guest_numa_config() where "allowed_cpus" is specified, and
LibvirtDriver._get_guest_config() where guest.cpuset is written.)
Chris
More information about the OpenStack-dev
mailing list