[nova] NUMA live migration - mostly how it's tested

Sean Mooney smooney at redhat.com
Fri Mar 1 13:29:40 UTC 2019


On Fri, 2019-03-01 at 17:30 +0800, Luyao Zhong wrote:
> Hi all,
> 
> There was something wrong with the live migration when using 'dedicated' 
> cpu_policy in my test. Attached file contains the details.
> 
> The message body will be so big that it will be held if I attach the 
> debug info, so I will send another email.


looking at your original email  you stated that 

# VM server_on_host1 cpu pinning info
  <cputune>
    <shares>4096</shares>
    <vcpupin vcpu='0' cpuset='43'/>
    <vcpupin vcpu='1' cpuset='7'/>
    <vcpupin vcpu='2' cpuset='16'/>
    <vcpupin vcpu='3' cpuset='52'/>
    <emulatorpin cpuset='7,16,43,52'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
  </numatune>


# VM server_on_host2 cpu pinning info (before migration)
  <cputune>
    <shares>4096</shares>
    <vcpupin vcpu='0' cpuset='43'/>
    <vcpupin vcpu='1' cpuset='7'/>
    <vcpupin vcpu='2' cpuset='16'/>
    <vcpupin vcpu='3' cpuset='52'/>
    <emulatorpin cpuset='7,16,43,52'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
  </numatune>

however looking at the full domain
server_on_host1 was  
<cputune>
    <shares>4096</shares>
    <vcpupin vcpu='0' cpuset='43'/>
    <vcpupin vcpu='1' cpuset='7'/>
    <vcpupin vcpu='2' cpuset='16'/>
    <vcpupin vcpu='3' cpuset='52'/>
    <emulatorpin cpuset='7,16,43,52'/>
  </cputune>

but server_on_host2 was 
 <cputune>
    <shares>4096</shares>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='38'/>
    <vcpupin vcpu='2' cpuset='8'/>
    <vcpupin vcpu='3' cpuset='44'/>
    <emulatorpin cpuset='2,8,38,44'/>

assuming the full xml attached for server_on_host2 is correct
then this shows that the code is working correctly as it nolonger overlaps.



> 
> Regards,
> Luyao
> 
> 
> On 2019/2/28 下午9:28, Sean Mooney wrote:
> > On Wed, 2019-02-27 at 21:33 -0500, Artom Lifshitz wrote:
> > > 
> > > 
> > > On Wed, Feb 27, 2019, 21:27 Matt Riedemann, <mriedemos at gmail.com> wrote:
> > > > On 2/27/2019 7:25 PM, Artom Lifshitz wrote:
> > > > > What I've been using for testing is this: [3]. It's a series of
> > > > > patches to whitebox_tempest_plugin, a Tempest plugin used by a bunch
> > > > > of us Nova Red Hatters to automate testing that's outside of Tempest's
> > > > > scope.
> > > > 
> > > > And where is that pulling in your nova series of changes and posting
> > > > test results (like a 3rd party CI) so anyone can see it? Or do you mean
> > > > here are tests, but you need to provide your own environment if you want
> > > > to verify the code prior to merging it.
> > > 
> > > Sorry, wasn't clear. It's the latter. The test code exists, and has run against my devstack environment with my
> > > patches checked out, but there's no CI or public posting of test results. Getting CI coverage for these NUMA
> > > things
> > > (like the old Intel one) is a whole other topic.
> > 
> > on the ci front i resolved the nested vert on the server i bought to set up a personal ci for numa testing.
> > that set me back a few weeks in setting up that ci but i hope to run artom whitebox test amoung other in that at
> > some
> > point. vexhost also provided nested virt to the gate vms. im going to see if we can actully create a non voting job
> > using the ubuntu-bionic-vexxhost nodeset. if ovh or one of the other providers of ci resource renable nested virt
> > then we can maybe make that job voting and not need thridparty ci anymor.
> > > > Can we really not even have functional tests with the fake libvirt
> > > > driver and fake numa resources to ensure the flow doesn't blow up?
> > > 
> > > That's something I have to look into. We have live migration functional tests, and we have NUMA functional tests,
> > > but
> > > I'm not sure how we can combine the two.
> > 
> > jus as an addtional proof point im am planning to do a bunch of migration and live migration testing in the next 2-4
> > weeks.
> > 
> > my current backlog on no particalar order is
> > sriov migration
> > numa migration
> > vtpm migration
> > cross-cell migration
> > cross-neutron backend migration (ovs<->linuxbridge)
> > cross-firwall migraton (iptables<->contrack) (previously tested and worked at end of queens)
> > 
> > narrowong in on the numa migration the current set of testcases i plan to manually verify are as follows:
> > 
> > note assume all flavor will have 256mb of ram and 4 cores unless otherwise stated
> > 
> > basic tests
> > pinned guests (hw:cpu_policy=dedicated)
> > pinned-isolated guests (hw:cpu_policy=dedicated hw:thread_policy=isolate)
> > pinned-prefer guests (hw:cpu_policy=dedicated hw:thread_policy=prefer)
> > unpinned-singel-numa guest (hw:numa_nodes=1)
> > unpinned-dual-numa guest (hw:numa_nodes=2)
> > unpinned-dual-numa-unblanced guest (hw:numa_nodes=2 hw:numa_cpu.0=1 hw:numa_cpu.1=1-3
> > hw:numa_mem.0=64 hw:numa_mem.0=192)
> > unpinned-hugepage-implcit numa guest (hw:mem_page_size=large)
> > unpinned-hugepage-multi numa guest (hw:mem_page_size=large hw:numa_nodes=2)
> > pinned-hugepage-multi numa guest (hw:mem_page_size=large hw:numa_nodes=2 hw:cpu_policy=dedicated)
> > realtime guest (hw:cpu_policy=dedicated hw:cpu_realtime=yes hw:cpu_realtime_mask=^0-1)
> > emulator-thread-iosolated guest (hw:cpu_policy=dedicated hw:emulator_threads_policy=isolate)
> > 
> > advanced tests (require extra nova.conf changes)
> > emulator-thread-shared guest (hw:cpu_policy=dedicated hw:emulator_threads_policy=shared) note cpu_share_set
> > configrued
> > unpinned-singel-numa-hetorgious-host guest (hw:numa_nodes=1) note vcpu_pin_set adjusted so that
> > host 1 only has cpus on
> > numa 1 and host 2 only has cpus on numa node 2.
> > supper-optimiesd-guest (hw:numa_nodes=2 hw:numa_cpu.0=1 hw:numa_cpu.1=1-3
> > hw:numa_mem.0=64 hw:numa_mem.0=192 hw:cpu_realtime=yes hw:cpu_realtime_mask=^0-1 hw:emulator_threads_policy=isolate)
> > supper-optimiesd-guest-2 (hw:numa_nodes=2 hw:numa_cpu.0=1 hw:numa_cpu.1=1-3 hw:numa_mem.0=64 hw:numa_mem.0=192
> > hw:cpu_realtime=yes hw:cpu_realtime_mask=^0-1 hw:emulator_threads_policy=share)
> > 
> > 
> > for each of these test ill provide a test-command file with the command i used to run the tests and reustlts file
> > with a summary at the top plus the xmls before and after the migration showing that intially the resouces
> > would conflict on migration and then the updated xmls after the migration.
> > i will also provide the local.conf for the devstack deployment and some details about the env like
> > distor/qemu/libvirt
> > versions.
> > 
> > eventurally i hope all those test cases can be added to the whitebox plugin and verifed in a ci.
> > we could also try and valideate them in functional tests.
> > 
> > i have attached the xml for the pinned guest as an example of what to expect but i will be compileing this slowly as
> > i
> > go and zip everying up in an email to the list.
> > this will take some time to complete and hosestly i had planned to do most of this testing after feature freeze when
> > we
> > can focus on testing more.
> > 
> > regards
> > sean
> > 
> > 




More information about the openstack-discuss mailing list