Hi:

 

The main code of NUMA aware live migration was merged. I’m testing it recently.

 

If only set NUMA property(‘hw:numa_nodes’, ‘hw:numa_cpus’, ‘hw:numa_mem’), it works well. But if add the property “hw:cpu_policy='dedicated'”, it will not correct after serval live migrations.

Which means the live migrate can be success, but the vCPU pin are not correct(two instance have serval same vCPU pin on same host).

 

Below is my test steps.

env:

       code: master branch (build on 16 September 2019, include the patches of NUMA aware live migration)

three compute node:

-          s1:                  24C, 48G (2 NUMA nodes)

-          stein-2:         12C, 24G (2 NUMA nodes)

-          stein-3:         8C, 16G (2 NUMA nodes)

 

flavor1 (2c2g): hw:cpu_policy='dedicated', hw:numa_cpus.0='0', hw:numa_cpus.1='1', hw:numa_mem.0='1024', hw:numa_mem.1='1024', hw:numa_nodes='2'

flavor2 (4c4g): hw:cpu_policy='dedicated', hw:numa_cpus.0='0,1,2', hw:numa_cpus.1='3', hw:numa_mem.0='1024', hw:numa_mem.1='3072', hw:numa_nodes='2'

image has no property.

 

 

I create four instances(2*flavor1, 2* flavor2), then begin live migration on by one(one instance live migrate done, next instance begin live migrate) and check the vCPU pin whether is correct.

After serval live migrations, the vCPU pin will not correct. (You can find full migration list in attached file). The last live migrate is:

 

+-----+--------------------------------------+-------------+-----------+----------------+--------------+----------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+

| Id  | UUID                                 | Source Node | Dest Node | Source Compute | Dest Compute | Dest Host      | Status    | Instance UUID                        | Old Flavor | New Flavor | Created At                 | Updated At                 | Type           |

+-----+--------------------------------------+-------------+-----------+----------------+--------------+----------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+

| 470 | 2a9ba183-4f91-4fbf-93cf-6f0e55cc085a | s1          | stein-3   | s1             | stein-3      | 172.16.130.153 | completed | bf0466f6-4815-4824-8586-899817207564 | 1          | 1          | 2019-09-17T10:28:46.000000 | 2019-09-17T10:29:09.000000 | live-migration |

| 469 | c05ea0e8-f040-463e-8957-a59f70ed8bf6 | s1          | stein-3   | s1             | stein-3      | 172.16.130.153 | completed | a3ec7a29-80de-4541-989d-4b9f4377f0bd | 1          | 1          | 2019-09-17T10:28:21.000000 | 2019-09-17T10:28:45.000000 | live-migration |

| 468 | cef4c609-157e-4b39-b6cc-f5528d49c75a | s1          | stein-2   | s1             | stein-2      | 172.16.130.152 | completed | 83dab721-3343-436d-bee7-f5ffc0d0d38d | 4          | 4          | 2019-09-17T10:27:57.000000 | 2019-09-17T10:28:21.000000 | live-migration |

| 467 | 5471e441-2a50-465a-bb63-3fe1bb2e81b9 | s1          | stein-2   | s1             | stein-2      | 172.16.130.152 | completed | e3c19fbe-7b94-4a65-a803-51daa9934378 | 4          | 4          | 2019-09-17T10:27:32.000000 | 2019-09-17T10:27:57.000000 | live-migration |

 

 

There are two instances land on stein-3, and the two instances have same vCPU pin:

 

(nova-libvirt)[root@stein-3 /]# virsh list --all

Id    Name                           State

----------------------------------------------------

32    instance-00000025              running

33    instance-00000024              running

 

(nova-libvirt)[root@stein-3 /]# virsh vcpupin 32

VCPU: CPU Affinity

----------------------------------

   0: 2

   1: 7

 

(nova-libvirt)[root@stein-3 /]# virsh vcpupin 33

VCPU: CPU Affinity

----------------------------------

   0: 2

   1: 7

 

 

I checked the nova compute’s log on stein-3(you can find the log in attached log), then I found ‘host_topology’ isn’t updated when ‘hardware.numa_fit_instance_to_host’ be called in claims. ‘host_topology’ is the property of ‘objects.ComputeNode’ and it’s cached in ‘ResourceTracker’, it will use cached ‘cn’ to build  ‘claim’ when ‘check_can_live_migrate_destination’ called. Therefore, I guess the cache was not updated or updated too late or some other reason.

I also checked the database, the NUMA topology of  the two instances have same vCPU pin: “[0,2], [1,7]”, and the compute node: stein-3 only has vCPU pin: “[2], [7]”.

 

Please correct me if there is something wrong :)

 

Best Regards