[nova] The test of NUMA aware live migration

Artom Lifshitz notartom at gmail.com
Tue Sep 17 15:11:55 UTC 2019


If you can post the full logs (in debug mode) somewhere I can have a look.
Based on what you're saying, it looks like there might be a race between
updating the host topology and another instance claiming resources -
although claims are supposed to be race-free because they use the
COMPUTE_RESOURCES_SEMAPHORE [1].

[1]
https://github.com/openstack/nova/blob/082c91a9286ae55fd5eb6adeed52500dc75be5ce/nova/compute/resource_tracker.py#L257

On Tue, Sep 17, 2019 at 8:44 AM wang.ya <wang.ya at 99cloud.net> wrote:

> Hi:
>
>
>
> The main code of NUMA aware live migration was merged. I’m testing it
> recently.
>
>
>
> If only set NUMA property(‘hw:numa_nodes’, ‘hw:numa_cpus’, ‘hw:numa_mem’),
> it works well. But if add the property “hw:cpu_policy='dedicated'”, it will
> not correct after serval live migrations.
>
> Which means the live migrate can be success, but the vCPU pin are not
> correct(two instance have serval same vCPU pin on same host).
>
>
>
> Below is my test steps.
>
> env:
>
>        code: master branch (build on 16 September 2019, include the
> patches of NUMA aware live migration)
>
> three compute node:
>
> -          s1:                  24C, 48G (2 NUMA nodes)
>
> -          stein-2:         12C, 24G (2 NUMA nodes)
>
> -          stein-3:         8C, 16G (2 NUMA nodes)
>
>
>
> flavor1 (2c2g): hw:cpu_policy='dedicated', hw:numa_cpus.0='0',
> hw:numa_cpus.1='1', hw:numa_mem.0='1024', hw:numa_mem.1='1024',
> hw:numa_nodes='2'
>
> flavor2 (4c4g): hw:cpu_policy='dedicated', hw:numa_cpus.0='0,1,2',
> hw:numa_cpus.1='3', hw:numa_mem.0='1024', hw:numa_mem.1='3072',
> hw:numa_nodes='2'
>
> image has no property.
>
>
>
>
>
> I create four instances(2*flavor1, 2* flavor2), then begin live migration
> on by one(one instance live migrate done, next instance begin live migrate)
> and check the vCPU pin whether is correct.
>
> After serval live migrations, the vCPU pin will not correct. (You can find
> full migration list in attached file). The last live migrate is:
>
>
>
>
> *+-----+--------------------------------------+-------------+-----------+----------------+--------------+----------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+*
>
> *| Id  | UUID                                 | Source Node | Dest Node |
> Source Compute | Dest Compute | Dest Host      | Status    | Instance
> UUID                        | Old Flavor | New Flavor | Created
> At                 | Updated At                 | Type           |*
>
>
> *+-----+--------------------------------------+-------------+-----------+----------------+--------------+----------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+*
>
> *| 470 | 2a9ba183-4f91-4fbf-93cf-6f0e55cc085a | s1          | stein-3   |
> s1             | stein-3      | 172.16.130.153 | completed |
> bf0466f6-4815-4824-8586-899817207564 | 1          | 1          |
> 2019-09-17T10:28:46.000000 | 2019-09-17T10:29:09.000000 | live-migration |*
>
> *| 469 | c05ea0e8-f040-463e-8957-a59f70ed8bf6 | s1          | stein-3   |
> s1             | stein-3      | 172.16.130.153 | completed |
> a3ec7a29-80de-4541-989d-4b9f4377f0bd | 1          | 1          |
> 2019-09-17T10:28:21.000000 | 2019-09-17T10:28:45.000000 | live-migration |*
>
> *| 468 | cef4c609-157e-4b39-b6cc-f5528d49c75a | s1          | stein-2   |
> s1             | stein-2      | 172.16.130.152 | completed |
> 83dab721-3343-436d-bee7-f5ffc0d0d38d | 4          | 4          |
> 2019-09-17T10:27:57.000000 | 2019-09-17T10:28:21.000000 | live-migration |*
>
> *| 467 | 5471e441-2a50-465a-bb63-3fe1bb2e81b9 | s1          | stein-2   |
> s1             | stein-2      | 172.16.130.152 | completed |
> e3c19fbe-7b94-4a65-a803-51daa9934378 | 4          | 4          |
> 2019-09-17T10:27:32.000000 | 2019-09-17T10:27:57.000000 | live-migration |*
>
>
>
>
>
> There are two instances land on stein-3, and the two instances have same
> vCPU pin:
>
>
>
> *(nova-libvirt)[root at stein-3 /]# virsh list --all*
>
> * Id    Name                           State*
>
> *----------------------------------------------------*
>
> * 32    instance-00000025              running*
>
> * 33    instance-00000024              running*
>
>
>
> *(nova-libvirt)[root at stein-3 /]# virsh vcpupin 32*
>
> *VCPU: CPU Affinity*
>
> *----------------------------------*
>
> *   0: 2*
>
> *   1: 7*
>
>
>
> *(nova-libvirt)[root at stein-3 /]# virsh vcpupin 33*
>
> *VCPU: CPU Affinity*
>
> *----------------------------------*
>
> *   0: 2*
>
> *   1: 7*
>
>
>
>
>
> I checked the nova compute’s log on stein-3(you can find the log in
> attached log), then I found ‘host_topology’ isn’t updated when
> ‘hardware.numa_fit_instance_to_host’ be called in claims. ‘host_topology’
> is the property of ‘objects.ComputeNode’ and it’s cached in
> ‘ResourceTracker’, it will use cached ‘cn’ to build  ‘claim’ when
> ‘check_can_live_migrate_destination’ called. Therefore, I guess the cache
> was not updated or updated too late or some other reason.
>
> I also checked the database, the NUMA topology of  *the two instances*
> have same vCPU pin: “[0,2], [1,7]”, and the *compute node: stein-*3 only
> has vCPU pin: “[2], [7]”.
>
>
>
> Please correct me if there is something wrong :)
>
>
>
> Best Regards
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190917/6fb773e3/attachment.html>


More information about the openstack-discuss mailing list