Re: [nova] The test of NUMA aware live migration

17 Sep 2019


      If you can post the full logs (in debug mode) somewhere I can have a look.
Based on what you're saying, it looks like there might be a race between
updating the host topology and another instance claiming resources -
although claims are supposed to be race-free because they use the
COMPUTE_RESOURCES_SEMAPHORE [1].

[1]
https://github.com/openstack/nova/blob/082c91a9286ae55fd5eb6adeed52500dc75be...

On Tue, Sep 17, 2019 at 8:44 AM wang.ya <wang.ya@99cloud.net> wrote:
...
Hi:
The main code of NUMA aware live migration was merged. I’m testing it
recently.
If only set NUMA property(‘hw:numa_nodes’, ‘hw:numa_cpus’, ‘hw:numa_mem’),
it works well. But if add the property “hw:cpu_policy='dedicated'”, it will
not correct after serval live migrations.
Which means the live migrate can be success, but the vCPU pin are not
correct(two instance have serval same vCPU pin on same host).
Below is my test steps.
env:
code: master branch (build on 16 September 2019, include the
patches of NUMA aware live migration)
three compute node:
-          s1:                  24C, 48G (2 NUMA nodes)
-          stein-2:         12C, 24G (2 NUMA nodes)
-          stein-3:         8C, 16G (2 NUMA nodes)
flavor1 (2c2g): hw:cpu_policy='dedicated', hw:numa_cpus.0='0',
hw:numa_cpus.1='1', hw:numa_mem.0='1024', hw:numa_mem.1='1024',
hw:numa_nodes='2'
flavor2 (4c4g): hw:cpu_policy='dedicated', hw:numa_cpus.0='0,1,2',
hw:numa_cpus.1='3', hw:numa_mem.0='1024', hw:numa_mem.1='3072',
hw:numa_nodes='2'
image has no property.
I create four instances(2*flavor1, 2* flavor2), then begin live migration
on by one(one instance live migrate done, next instance begin live migrate)
and check the vCPU pin whether is correct.
After serval live migrations, the vCPU pin will not correct. (You can find
full migration list in attached file). The last live migrate is:
*+-----+--------------------------------------+-------------+-----------+----------------+--------------+----------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+*
*| Id  | UUID                                 | Source Node | Dest Node |
Source Compute | Dest Compute | Dest Host      | Status    | Instance
UUID                        | Old Flavor | New Flavor | Created
At                 | Updated At                 | Type           |*
*+-----+--------------------------------------+-------------+-----------+----------------+--------------+----------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+*
*| 470 | 2a9ba183-4f91-4fbf-93cf-6f0e55cc085a | s1          | stein-3   |
s1             | stein-3      | 172.16.130.153 | completed |
bf0466f6-4815-4824-8586-899817207564 | 1          | 1          |
2019-09-17T10:28:46.000000 | 2019-09-17T10:29:09.000000 | live-migration |*
*| 469 | c05ea0e8-f040-463e-8957-a59f70ed8bf6 | s1          | stein-3   |
s1             | stein-3      | 172.16.130.153 | completed |
a3ec7a29-80de-4541-989d-4b9f4377f0bd | 1          | 1          |
2019-09-17T10:28:21.000000 | 2019-09-17T10:28:45.000000 | live-migration |*
*| 468 | cef4c609-157e-4b39-b6cc-f5528d49c75a | s1          | stein-2   |
s1             | stein-2      | 172.16.130.152 | completed |
83dab721-3343-436d-bee7-f5ffc0d0d38d | 4          | 4          |
2019-09-17T10:27:57.000000 | 2019-09-17T10:28:21.000000 | live-migration |*
*| 467 | 5471e441-2a50-465a-bb63-3fe1bb2e81b9 | s1          | stein-2   |
s1             | stein-2      | 172.16.130.152 | completed |
e3c19fbe-7b94-4a65-a803-51daa9934378 | 4          | 4          |
2019-09-17T10:27:32.000000 | 2019-09-17T10:27:57.000000 | live-migration |*
There are two instances land on stein-3, and the two instances have same
vCPU pin:
*(nova-libvirt)[root@stein-3 /]# virsh list --all*
* Id    Name                           State*
*----------------------------------------------------*
* 32    instance-00000025              running*
* 33    instance-00000024              running*
*(nova-libvirt)[root@stein-3 /]# virsh vcpupin 32*
*VCPU: CPU Affinity*
*----------------------------------*
*   0: 2*
*   1: 7*
*(nova-libvirt)[root@stein-3 /]# virsh vcpupin 33*
*VCPU: CPU Affinity*
*----------------------------------*
*   0: 2*
*   1: 7*
I checked the nova compute’s log on stein-3(you can find the log in
attached log), then I found ‘host_topology’ isn’t updated when
‘hardware.numa_fit_instance_to_host’ be called in claims. ‘host_topology’
is the property of ‘objects.ComputeNode’ and it’s cached in
‘ResourceTracker’, it will use cached ‘cn’ to build  ‘claim’ when
‘check_can_live_migrate_destination’ called. Therefore, I guess the cache
was not updated or updated too late or some other reason.
I also checked the database, the NUMA topology of  *the two instances*
have same vCPU pin: “[0,2], [1,7]”, and the *compute node: stein-*3 only
has vCPU pin: “[2], [7]”.
Please correct me if there is something wrong :)
Best Regards

Re: [nova] The test of NUMA aware live migration

Artom Lifshitz