[nova] The test of NUMA aware live migration

wang.ya wang.ya at 99cloud.net
Tue Sep 17 12:44:38 UTC 2019


Hi:

 

The main code of NUMA aware live migration was merged. I’m testing it recently.

 

If only set NUMA property(‘hw:numa_nodes’, ‘hw:numa_cpus’, ‘hw:numa_mem’), it works well. But if add the property “hw:cpu_policy='dedicated'”, it will not correct after serval live migrations.

Which means the live migrate can be success, but the vCPU pin are not correct(two instance have serval same vCPU pin on same host).

 

Below is my test steps.

env:

       code: master branch (build on 16 September 2019, include the patches of NUMA aware live migration)

three compute node:

-          s1:                  24C, 48G (2 NUMA nodes)

-          stein-2:         12C, 24G (2 NUMA nodes)

-          stein-3:         8C, 16G (2 NUMA nodes)

 

flavor1 (2c2g): hw:cpu_policy='dedicated', hw:numa_cpus.0='0', hw:numa_cpus.1='1', hw:numa_mem.0='1024', hw:numa_mem.1='1024', hw:numa_nodes='2'

flavor2 (4c4g): hw:cpu_policy='dedicated', hw:numa_cpus.0='0,1,2', hw:numa_cpus.1='3', hw:numa_mem.0='1024', hw:numa_mem.1='3072', hw:numa_nodes='2'

image has no property.

 

 

I create four instances(2*flavor1, 2* flavor2), then begin live migration on by one(one instance live migrate done, next instance begin live migrate) and check the vCPU pin whether is correct.

After serval live migrations, the vCPU pin will not correct. (You can find full migration list in attached file). The last live migrate is:

 

+-----+--------------------------------------+-------------+-----------+----------------+--------------+----------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+

| Id  | UUID                                 | Source Node | Dest Node | Source Compute | Dest Compute | Dest Host      | Status    | Instance UUID                        | Old Flavor | New Flavor | Created At                 | Updated At                 | Type           |

+-----+--------------------------------------+-------------+-----------+----------------+--------------+----------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+

| 470 | 2a9ba183-4f91-4fbf-93cf-6f0e55cc085a | s1          | stein-3   | s1             | stein-3      | 172.16.130.153 | completed | bf0466f6-4815-4824-8586-899817207564 | 1          | 1          | 2019-09-17T10:28:46.000000 | 2019-09-17T10:29:09.000000 | live-migration |

| 469 | c05ea0e8-f040-463e-8957-a59f70ed8bf6 | s1          | stein-3   | s1             | stein-3      | 172.16.130.153 | completed | a3ec7a29-80de-4541-989d-4b9f4377f0bd | 1          | 1          | 2019-09-17T10:28:21.000000 | 2019-09-17T10:28:45.000000 | live-migration |

| 468 | cef4c609-157e-4b39-b6cc-f5528d49c75a | s1          | stein-2   | s1             | stein-2      | 172.16.130.152 | completed | 83dab721-3343-436d-bee7-f5ffc0d0d38d | 4          | 4          | 2019-09-17T10:27:57.000000 | 2019-09-17T10:28:21.000000 | live-migration |

| 467 | 5471e441-2a50-465a-bb63-3fe1bb2e81b9 | s1          | stein-2   | s1             | stein-2      | 172.16.130.152 | completed | e3c19fbe-7b94-4a65-a803-51daa9934378 | 4          | 4          | 2019-09-17T10:27:32.000000 | 2019-09-17T10:27:57.000000 | live-migration |

 

 

There are two instances land on stein-3, and the two instances have same vCPU pin:

 

(nova-libvirt)[root at stein-3 /]# virsh list --all

 Id    Name                           State

----------------------------------------------------

 32    instance-00000025              running

 33    instance-00000024              running

 

(nova-libvirt)[root at stein-3 /]# virsh vcpupin 32

VCPU: CPU Affinity

----------------------------------

   0: 2

   1: 7

 

(nova-libvirt)[root at stein-3 /]# virsh vcpupin 33

VCPU: CPU Affinity

----------------------------------

   0: 2

   1: 7

 

 

I checked the nova compute’s log on stein-3(you can find the log in attached log), then I found ‘host_topology’ isn’t updated when ‘hardware.numa_fit_instance_to_host’ be called in claims. ‘host_topology’ is the property of ‘objects.ComputeNode’ and it’s cached in ‘ResourceTracker’, it will use cached ‘cn’ to build  ‘claim’ when ‘check_can_live_migrate_destination’ called. Therefore, I guess the cache was not updated or updated too late or some other reason.

I also checked the database, the NUMA topology of  the two instances have same vCPU pin: “[0,2], [1,7]”, and the compute node: stein-3 only has vCPU pin: “[2], [7]”.

 

Please correct me if there is something wrong :)

 

Best Regards

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190917/7c0c2283/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: migration-list.log
Type: application/octet-stream
Size: 15344 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190917/7c0c2283/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nova-compute.stein-3.log
Type: application/octet-stream
Size: 139018 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190917/7c0c2283/attachment-0003.obj>


More information about the openstack-discuss mailing list