[nova] NUMA live migration - mostly how it's tested

Luyao Zhong luyao.zhong at intel.com
Tue Mar 5 07:40:46 UTC 2019



On 2019/3/3 下午9:18, Sean Mooney wrote:
> On Sat, 2019-03-02 at 04:38 +0000, Zhong, Luyao wrote:
>>
>>
>>> 在 2019年3月1日,下午9:30,Sean Mooney <smooney at redhat.com> 写道:
>>>
>>>> On Fri, 2019-03-01 at 17:30 +0800, Luyao Zhong wrote:
>>>> Hi all,
>>>>
>>>> There was something wrong with the live migration when using 'dedicated'
>>>> cpu_policy in my test. Attached file contains the details.
>>>>
>>>> The message body will be so big that it will be held if I attach the
>>>> debug info, so I will send another email.
>>>
>>>
>>> looking at your original email  you stated that
>>>
>>> # VM server_on_host1 cpu pinning info
>>>   <cputune>
>>>     <shares>4096</shares>
>>>     <vcpupin vcpu='0' cpuset='43'/>
>>>     <vcpupin vcpu='1' cpuset='7'/>
>>>     <vcpupin vcpu='2' cpuset='16'/>
>>>     <vcpupin vcpu='3' cpuset='52'/>
>>>     <emulatorpin cpuset='7,16,43,52'/>
>>>   </cputune>
>>>   <numatune>
>>>     <memory mode='strict' nodeset='0'/>
>>>     <memnode cellid='0' mode='strict' nodeset='0'/>
>>>   </numatune>
>>>
>>>
>>> # VM server_on_host2 cpu pinning info (before migration)
>>>   <cputune>
>>>     <shares>4096</shares>
>>>     <vcpupin vcpu='0' cpuset='43'/>
>>>     <vcpupin vcpu='1' cpuset='7'/>
>>>     <vcpupin vcpu='2' cpuset='16'/>
>>>     <vcpupin vcpu='3' cpuset='52'/>
>>>     <emulatorpin cpuset='7,16,43,52'/>
>>>   </cputune>
>>>   <numatune>
>>>     <memory mode='strict' nodeset='0'/>
>>>     <memnode cellid='0' mode='strict' nodeset='0'/>
>>>   </numatune>
>>>
>>> however looking at the full domain
>>> server_on_host1 was
>>> <cputune>
>>>     <shares>4096</shares>
>>>     <vcpupin vcpu='0' cpuset='43'/>
>>>     <vcpupin vcpu='1' cpuset='7'/>
>>>     <vcpupin vcpu='2' cpuset='16'/>
>>>     <vcpupin vcpu='3' cpuset='52'/>
>>>     <emulatorpin cpuset='7,16,43,52'/>
>>>   </cputune>
>>>
>>> but server_on_host2 was
>>> <cputune>
>>>     <shares>4096</shares>
>>>     <vcpupin vcpu='0' cpuset='2'/>
>>>     <vcpupin vcpu='1' cpuset='38'/>
>>>     <vcpupin vcpu='2' cpuset='8'/>
>>>     <vcpupin vcpu='3' cpuset='44'/>
>>>     <emulatorpin cpuset='2,8,38,44'/>
>>>
>>> assuming the full xml attached for server_on_host2 is correct
>>> then this shows that the code is working correctly as it nolonger overlaps.
>>>
>>
>> I use virsh dumpxml to get the full xml, and I usually use virsh edit to see the xml, so I didn’t notice this
>> difference before. When using virsh edit ,I can see the overlaps. I’m not sure why I will get different results
>> between “virsh edit” and “virsh dumpxml”?
> at first i taught that sounded like a libvirt bug however the description of both command differ slightly
> 
>   dumpxml domain [--inactive] [--security-info] [--update-cpu] [--migratable]
>             Output the domain information as an XML dump to stdout, this format can be used by the create command.
>             Additional options affecting the XML dump may be used. --inactive tells
>             virsh to dump domain configuration that will be used on next start of the domain as opposed to the current
>             domain configuration.  Using --security-info will also include
>             security sensitive information in the XML dump. --update-cpu updates domain CPU requirements according to
>             host CPU. With --migratable one can request an XML that is suitable
>             for migrations, i.e., compatible with older libvirt releases and possibly amended with internal run-time
>             options. This option may automatically enable other options
>             (--update-cpu, --security-info, ...) as necessary.
> 
>    edit domain
>             Edit the XML configuration file for a domain, which will affect the next boot of the guest.
> 
>             This is equivalent to:
> 
>              virsh dumpxml --inactive --security-info domain > domain.xml
>              vi domain.xml (or make changes with your other text editor)
>              virsh define domain.xml
> 
>             except that it does some error checking.
> 
>             The editor used can be supplied by the $VISUAL or $EDITOR environment variables, and defaults to "vi".
> 
> i think virsh edit is showing you the state the vm will have on next reboot.
> which appears to be the original xml not the migration xml that was used to move the instance.
> 
> since openstack destorys the xml and recates it form scratch every time the outpu of virsh edit can be ignored.
> virs dumpxml will show the current state of the domain. if you did virsh dumpxml --inactive it would likely match
> virsh edit. its possible the modified xml we use when migrating a  domain is is considered transient but that is my best
> guess as to why they are different.
> 
> for evaulating this we should use the values of virsh dumpxml.
> 

I reboot the migrated VM, and then virsh dumpxml would get the same xml 
as that got from virsh edit before reboot. Artom Lifshitz mentioned that 
the instance_numa_topology wasn't updated in the database from Windriver 
folks. So I guess virsh edit show xml produced by OpenStack according to 
the db, maybe this is why we get different xmls with 'edit' and 
'dumpxml'. This is a little complex to me and still needs more test and 
debug work. Thank you so much for giving these details.

Regards,
Luyao

>>>
>>>>
>>>> Regards,
>>>> Luyao
>>>>
>>>>
>>>>> On 2019/2/28 下午9:28, Sean Mooney wrote:
>>>>>> On Wed, 2019-02-27 at 21:33 -0500, Artom Lifshitz wrote:
>>>>>>
>>>>>>
>>>>>>> On Wed, Feb 27, 2019, 21:27 Matt Riedemann, <mriedemos at gmail.com> wrote:
>>>>>>>> On 2/27/2019 7:25 PM, Artom Lifshitz wrote:
>>>>>>>> What I've been using for testing is this: [3]. It's a series of
>>>>>>>> patches to whitebox_tempest_plugin, a Tempest plugin used by a bunch
>>>>>>>> of us Nova Red Hatters to automate testing that's outside of Tempest's
>>>>>>>> scope.
>>>>>>>
>>>>>>> And where is that pulling in your nova series of changes and posting
>>>>>>> test results (like a 3rd party CI) so anyone can see it? Or do you mean
>>>>>>> here are tests, but you need to provide your own environment if you want
>>>>>>> to verify the code prior to merging it.
>>>>>>
>>>>>> Sorry, wasn't clear. It's the latter. The test code exists, and has run against my devstack environment with
>>>>>> my
>>>>>> patches checked out, but there's no CI or public posting of test results. Getting CI coverage for these NUMA
>>>>>> things
>>>>>> (like the old Intel one) is a whole other topic.
>>>>>
>>>>> on the ci front i resolved the nested vert on the server i bought to set up a personal ci for numa testing.
>>>>> that set me back a few weeks in setting up that ci but i hope to run artom whitebox test amoung other in that at
>>>>> some
>>>>> point. vexhost also provided nested virt to the gate vms. im going to see if we can actully create a non voting
>>>>> job
>>>>> using the ubuntu-bionic-vexxhost nodeset. if ovh or one of the other providers of ci resource renable nested
>>>>> virt
>>>>> then we can maybe make that job voting and not need thridparty ci anymor.
>>>>>>> Can we really not even have functional tests with the fake libvirt
>>>>>>> driver and fake numa resources to ensure the flow doesn't blow up?
>>>>>>
>>>>>> That's something I have to look into. We have live migration functional tests, and we have NUMA functional
>>>>>> tests,
>>>>>> but
>>>>>> I'm not sure how we can combine the two.
>>>>>
>>>>> jus as an addtional proof point im am planning to do a bunch of migration and live migration testing in the next
>>>>> 2-4
>>>>> weeks.
>>>>>
>>>>> my current backlog on no particalar order is
>>>>> sriov migration
>>>>> numa migration
>>>>> vtpm migration
>>>>> cross-cell migration
>>>>> cross-neutron backend migration (ovs<->linuxbridge)
>>>>> cross-firwall migraton (iptables<->contrack) (previously tested and worked at end of queens)
>>>>>
>>>>> narrowong in on the numa migration the current set of testcases i plan to manually verify are as follows:
>>>>>
>>>>> note assume all flavor will have 256mb of ram and 4 cores unless otherwise stated
>>>>>
>>>>> basic tests
>>>>> pinned guests (hw:cpu_policy=dedicated)
>>>>> pinned-isolated guests (hw:cpu_policy=dedicated hw:thread_policy=isolate)
>>>>> pinned-prefer guests (hw:cpu_policy=dedicated hw:thread_policy=prefer)
>>>>> unpinned-singel-numa guest (hw:numa_nodes=1)
>>>>> unpinned-dual-numa guest (hw:numa_nodes=2)
>>>>> unpinned-dual-numa-unblanced guest (hw:numa_nodes=2 hw:numa_cpu.0=1 hw:numa_cpu.1=1-3
>>>>> hw:numa_mem.0=64 hw:numa_mem.0=192)
>>>>> unpinned-hugepage-implcit numa guest (hw:mem_page_size=large)
>>>>> unpinned-hugepage-multi numa guest (hw:mem_page_size=large hw:numa_nodes=2)
>>>>> pinned-hugepage-multi numa guest (hw:mem_page_size=large hw:numa_nodes=2 hw:cpu_policy=dedicated)
>>>>> realtime guest (hw:cpu_policy=dedicated hw:cpu_realtime=yes hw:cpu_realtime_mask=^0-1)
>>>>> emulator-thread-iosolated guest (hw:cpu_policy=dedicated hw:emulator_threads_policy=isolate)
>>>>>
>>>>> advanced tests (require extra nova.conf changes)
>>>>> emulator-thread-shared guest (hw:cpu_policy=dedicated hw:emulator_threads_policy=shared) note cpu_share_set
>>>>> configrued
>>>>> unpinned-singel-numa-hetorgious-host guest (hw:numa_nodes=1) note vcpu_pin_set adjusted so that
>>>>> host 1 only has cpus on
>>>>> numa 1 and host 2 only has cpus on numa node 2.
>>>>> supper-optimiesd-guest (hw:numa_nodes=2 hw:numa_cpu.0=1 hw:numa_cpu.1=1-3
>>>>> hw:numa_mem.0=64 hw:numa_mem.0=192 hw:cpu_realtime=yes hw:cpu_realtime_mask=^0-1
>>>>> hw:emulator_threads_policy=isolate)
>>>>> supper-optimiesd-guest-2 (hw:numa_nodes=2 hw:numa_cpu.0=1 hw:numa_cpu.1=1-3 hw:numa_mem.0=64 hw:numa_mem.0=192
>>>>> hw:cpu_realtime=yes hw:cpu_realtime_mask=^0-1 hw:emulator_threads_policy=share)
>>>>>
>>>>>
>>>>> for each of these test ill provide a test-command file with the command i used to run the tests and reustlts
>>>>> file
>>>>> with a summary at the top plus the xmls before and after the migration showing that intially the resouces
>>>>> would conflict on migration and then the updated xmls after the migration.
>>>>> i will also provide the local.conf for the devstack deployment and some details about the env like
>>>>> distor/qemu/libvirt
>>>>> versions.
>>>>>
>>>>> eventurally i hope all those test cases can be added to the whitebox plugin and verifed in a ci.
>>>>> we could also try and valideate them in functional tests.
>>>>>
>>>>> i have attached the xml for the pinned guest as an example of what to expect but i will be compileing this
>>>>> slowly as
>>>>> i
>>>>> go and zip everying up in an email to the list.
>>>>> this will take some time to complete and hosestly i had planned to do most of this testing after feature freeze
>>>>> when
>>>>> we
>>>>> can focus on testing more.
>>>>>
>>>>> regards
>>>>> sean
>>>>>
>>>>>
> 



More information about the openstack-discuss mailing list