[nova] NUMA live migration - mostly how it's tested

Zhong, Luyao luyao.zhong at intel.com
Sat Mar 2 04:38:21 UTC 2019




> 在 2019年3月1日,下午9:30,Sean Mooney <smooney at redhat.com> 写道:
> 
>> On Fri, 2019-03-01 at 17:30 +0800, Luyao Zhong wrote:
>> Hi all,
>> 
>> There was something wrong with the live migration when using 'dedicated' 
>> cpu_policy in my test. Attached file contains the details.
>> 
>> The message body will be so big that it will be held if I attach the 
>> debug info, so I will send another email.
> 
> 
> looking at your original email  you stated that 
> 
> # VM server_on_host1 cpu pinning info
>  <cputune>
>    <shares>4096</shares>
>    <vcpupin vcpu='0' cpuset='43'/>
>    <vcpupin vcpu='1' cpuset='7'/>
>    <vcpupin vcpu='2' cpuset='16'/>
>    <vcpupin vcpu='3' cpuset='52'/>
>    <emulatorpin cpuset='7,16,43,52'/>
>  </cputune>
>  <numatune>
>    <memory mode='strict' nodeset='0'/>
>    <memnode cellid='0' mode='strict' nodeset='0'/>
>  </numatune>
> 
> 
> # VM server_on_host2 cpu pinning info (before migration)
>  <cputune>
>    <shares>4096</shares>
>    <vcpupin vcpu='0' cpuset='43'/>
>    <vcpupin vcpu='1' cpuset='7'/>
>    <vcpupin vcpu='2' cpuset='16'/>
>    <vcpupin vcpu='3' cpuset='52'/>
>    <emulatorpin cpuset='7,16,43,52'/>
>  </cputune>
>  <numatune>
>    <memory mode='strict' nodeset='0'/>
>    <memnode cellid='0' mode='strict' nodeset='0'/>
>  </numatune>
> 
> however looking at the full domain
> server_on_host1 was  
> <cputune>
>    <shares>4096</shares>
>    <vcpupin vcpu='0' cpuset='43'/>
>    <vcpupin vcpu='1' cpuset='7'/>
>    <vcpupin vcpu='2' cpuset='16'/>
>    <vcpupin vcpu='3' cpuset='52'/>
>    <emulatorpin cpuset='7,16,43,52'/>
>  </cputune>
> 
> but server_on_host2 was 
> <cputune>
>    <shares>4096</shares>
>    <vcpupin vcpu='0' cpuset='2'/>
>    <vcpupin vcpu='1' cpuset='38'/>
>    <vcpupin vcpu='2' cpuset='8'/>
>    <vcpupin vcpu='3' cpuset='44'/>
>    <emulatorpin cpuset='2,8,38,44'/>
> 
> assuming the full xml attached for server_on_host2 is correct
> then this shows that the code is working correctly as it nolonger overlaps.
> 
I use virsh dumpxml to get the full xml, and I usually use virsh edit to see the xml, so I didn’t notice this difference before. When using virsh edit ,I can see the overlaps. I’m not sure why I will get different results between “virsh edit” and “virsh dumpxml”?
> 
>> 
>> Regards,
>> Luyao
>> 
>> 
>>> On 2019/2/28 下午9:28, Sean Mooney wrote:
>>>> On Wed, 2019-02-27 at 21:33 -0500, Artom Lifshitz wrote:
>>>> 
>>>> 
>>>>> On Wed, Feb 27, 2019, 21:27 Matt Riedemann, <mriedemos at gmail.com> wrote:
>>>>>> On 2/27/2019 7:25 PM, Artom Lifshitz wrote:
>>>>>> What I've been using for testing is this: [3]. It's a series of
>>>>>> patches to whitebox_tempest_plugin, a Tempest plugin used by a bunch
>>>>>> of us Nova Red Hatters to automate testing that's outside of Tempest's
>>>>>> scope.
>>>>> 
>>>>> And where is that pulling in your nova series of changes and posting
>>>>> test results (like a 3rd party CI) so anyone can see it? Or do you mean
>>>>> here are tests, but you need to provide your own environment if you want
>>>>> to verify the code prior to merging it.
>>>> 
>>>> Sorry, wasn't clear. It's the latter. The test code exists, and has run against my devstack environment with my
>>>> patches checked out, but there's no CI or public posting of test results. Getting CI coverage for these NUMA
>>>> things
>>>> (like the old Intel one) is a whole other topic.
>>> 
>>> on the ci front i resolved the nested vert on the server i bought to set up a personal ci for numa testing.
>>> that set me back a few weeks in setting up that ci but i hope to run artom whitebox test amoung other in that at
>>> some
>>> point. vexhost also provided nested virt to the gate vms. im going to see if we can actully create a non voting job
>>> using the ubuntu-bionic-vexxhost nodeset. if ovh or one of the other providers of ci resource renable nested virt
>>> then we can maybe make that job voting and not need thridparty ci anymor.
>>>>> Can we really not even have functional tests with the fake libvirt
>>>>> driver and fake numa resources to ensure the flow doesn't blow up?
>>>> 
>>>> That's something I have to look into. We have live migration functional tests, and we have NUMA functional tests,
>>>> but
>>>> I'm not sure how we can combine the two.
>>> 
>>> jus as an addtional proof point im am planning to do a bunch of migration and live migration testing in the next 2-4
>>> weeks.
>>> 
>>> my current backlog on no particalar order is
>>> sriov migration
>>> numa migration
>>> vtpm migration
>>> cross-cell migration
>>> cross-neutron backend migration (ovs<->linuxbridge)
>>> cross-firwall migraton (iptables<->contrack) (previously tested and worked at end of queens)
>>> 
>>> narrowong in on the numa migration the current set of testcases i plan to manually verify are as follows:
>>> 
>>> note assume all flavor will have 256mb of ram and 4 cores unless otherwise stated
>>> 
>>> basic tests
>>> pinned guests (hw:cpu_policy=dedicated)
>>> pinned-isolated guests (hw:cpu_policy=dedicated hw:thread_policy=isolate)
>>> pinned-prefer guests (hw:cpu_policy=dedicated hw:thread_policy=prefer)
>>> unpinned-singel-numa guest (hw:numa_nodes=1)
>>> unpinned-dual-numa guest (hw:numa_nodes=2)
>>> unpinned-dual-numa-unblanced guest (hw:numa_nodes=2 hw:numa_cpu.0=1 hw:numa_cpu.1=1-3
>>> hw:numa_mem.0=64 hw:numa_mem.0=192)
>>> unpinned-hugepage-implcit numa guest (hw:mem_page_size=large)
>>> unpinned-hugepage-multi numa guest (hw:mem_page_size=large hw:numa_nodes=2)
>>> pinned-hugepage-multi numa guest (hw:mem_page_size=large hw:numa_nodes=2 hw:cpu_policy=dedicated)
>>> realtime guest (hw:cpu_policy=dedicated hw:cpu_realtime=yes hw:cpu_realtime_mask=^0-1)
>>> emulator-thread-iosolated guest (hw:cpu_policy=dedicated hw:emulator_threads_policy=isolate)
>>> 
>>> advanced tests (require extra nova.conf changes)
>>> emulator-thread-shared guest (hw:cpu_policy=dedicated hw:emulator_threads_policy=shared) note cpu_share_set
>>> configrued
>>> unpinned-singel-numa-hetorgious-host guest (hw:numa_nodes=1) note vcpu_pin_set adjusted so that
>>> host 1 only has cpus on
>>> numa 1 and host 2 only has cpus on numa node 2.
>>> supper-optimiesd-guest (hw:numa_nodes=2 hw:numa_cpu.0=1 hw:numa_cpu.1=1-3
>>> hw:numa_mem.0=64 hw:numa_mem.0=192 hw:cpu_realtime=yes hw:cpu_realtime_mask=^0-1 hw:emulator_threads_policy=isolate)
>>> supper-optimiesd-guest-2 (hw:numa_nodes=2 hw:numa_cpu.0=1 hw:numa_cpu.1=1-3 hw:numa_mem.0=64 hw:numa_mem.0=192
>>> hw:cpu_realtime=yes hw:cpu_realtime_mask=^0-1 hw:emulator_threads_policy=share)
>>> 
>>> 
>>> for each of these test ill provide a test-command file with the command i used to run the tests and reustlts file
>>> with a summary at the top plus the xmls before and after the migration showing that intially the resouces
>>> would conflict on migration and then the updated xmls after the migration.
>>> i will also provide the local.conf for the devstack deployment and some details about the env like
>>> distor/qemu/libvirt
>>> versions.
>>> 
>>> eventurally i hope all those test cases can be added to the whitebox plugin and verifed in a ci.
>>> we could also try and valideate them in functional tests.
>>> 
>>> i have attached the xml for the pinned guest as an example of what to expect but i will be compileing this slowly as
>>> i
>>> go and zip everying up in an email to the list.
>>> this will take some time to complete and hosestly i had planned to do most of this testing after feature freeze when
>>> we
>>> can focus on testing more.
>>> 
>>> regards
>>> sean
>>> 
>>> 
> 



More information about the openstack-discuss mailing list