Re: [nova] NUMA live migration - mostly how it's tested

4 Mar 2019


      On 2019/3/3 下午9:18, Sean Mooney wrote:
...
On Sat, 2019-03-02 at 04:38 +0000, Zhong, Luyao wrote:
...
...
在 2019年3月1日，下午9:30，Sean Mooney <smooney@redhat.com> 写道：
...
On Fri, 2019-03-01 at 17:30 +0800, Luyao Zhong wrote:
Hi all,
There was something wrong with the live migration when using 'dedicated'
cpu_policy in my test. Attached file contains the details.
The message body will be so big that it will be held if I attach the
debug info, so I will send another email.
looking at your original email  you stated that
# VM server_on_host1 cpu pinning info
  <cputune>
    <shares>4096</shares>
    <vcpupin vcpu='0' cpuset='43'/>
    <vcpupin vcpu='1' cpuset='7'/>
    <vcpupin vcpu='2' cpuset='16'/>
    <vcpupin vcpu='3' cpuset='52'/>
    <emulatorpin cpuset='7,16,43,52'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
  </numatune>
# VM server_on_host2 cpu pinning info (before migration)
  <cputune>
    <shares>4096</shares>
    <vcpupin vcpu='0' cpuset='43'/>
    <vcpupin vcpu='1' cpuset='7'/>
    <vcpupin vcpu='2' cpuset='16'/>
    <vcpupin vcpu='3' cpuset='52'/>
    <emulatorpin cpuset='7,16,43,52'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
  </numatune>
however looking at the full domain
server_on_host1 was
<cputune>
    <shares>4096</shares>
    <vcpupin vcpu='0' cpuset='43'/>
    <vcpupin vcpu='1' cpuset='7'/>
    <vcpupin vcpu='2' cpuset='16'/>
    <vcpupin vcpu='3' cpuset='52'/>
    <emulatorpin cpuset='7,16,43,52'/>
  </cputune>
but server_on_host2 was
<cputune>
    <shares>4096</shares>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='38'/>
    <vcpupin vcpu='2' cpuset='8'/>
    <vcpupin vcpu='3' cpuset='44'/>
    <emulatorpin cpuset='2,8,38,44'/>
assuming the full xml attached for server_on_host2 is correct
then this shows that the code is working correctly as it nolonger overlaps.
I use virsh dumpxml to get the full xml, and I usually use virsh edit to see the xml, so I didn’t notice this
difference before. When using virsh edit ，I can see the overlaps. I’m not sure why I will get different results
between “virsh edit” and “virsh dumpxml”？
at first i taught that sounded like a libvirt bug however the description of both command differ slightly
dumpxml domain [--inactive] [--security-info] [--update-cpu] [--migratable]
            Output the domain information as an XML dump to stdout, this format can be used by the create command.
            Additional options affecting the XML dump may be used. --inactive tells
            virsh to dump domain configuration that will be used on next start of the domain as opposed to the current
            domain configuration.  Using --security-info will also include
            security sensitive information in the XML dump. --update-cpu updates domain CPU requirements according to
            host CPU. With --migratable one can request an XML that is suitable
            for migrations, i.e., compatible with older libvirt releases and possibly amended with internal run-time
            options. This option may automatically enable other options
            (--update-cpu, --security-info, ...) as necessary.
edit domain
            Edit the XML configuration file for a domain, which will affect the next boot of the guest.
This is equivalent to:
virsh dumpxml --inactive --security-info domain > domain.xml
             vi domain.xml (or make changes with your other text editor)
             virsh define domain.xml
except that it does some error checking.
The editor used can be supplied by the $VISUAL or $EDITOR environment variables, and defaults to "vi".
i think virsh edit is showing you the state the vm will have on next reboot.
which appears to be the original xml not the migration xml that was used to move the instance.
since openstack destorys the xml and recates it form scratch every time the outpu of virsh edit can be ignored.
virs dumpxml will show the current state of the domain. if you did virsh dumpxml --inactive it would likely match
virsh edit. its possible the modified xml we use when migrating a  domain is is considered transient but that is my best
guess as to why they are different.
for evaulating this we should use the values of virsh dumpxml.
I reboot the migrated VM, and then virsh dumpxml would get the same xml 
as that got from virsh edit before reboot. Artom Lifshitz mentioned that 
the instance_numa_topology wasn't updated in the database from Windriver 
folks. So I guess virsh edit show xml produced by OpenStack according to 
the db, maybe this is why we get different xmls with 'edit' and 
'dumpxml'. This is a little complex to me and still needs more test and 
debug work. Thank you so much for giving these details.

Regards,
Luyao
...
...
...
...
Regards,
Luyao
...
On 2019/2/28 下午9:28, Sean Mooney wrote:
...
On Wed, 2019-02-27 at 21:33 -0500, Artom Lifshitz wrote:
> On Wed, Feb 27, 2019, 21:27 Matt Riedemann, <mriedemos@gmail.com> wrote:
>> On 2/27/2019 7:25 PM, Artom Lifshitz wrote:
>> What I've been using for testing is this: [3]. It's a series of
>> patches to whitebox_tempest_plugin, a Tempest plugin used by a bunch
>> of us Nova Red Hatters to automate testing that's outside of Tempest's
>> scope.
>
> And where is that pulling in your nova series of changes and posting
> test results (like a 3rd party CI) so anyone can see it? Or do you mean
> here are tests, but you need to provide your own environment if you want
> to verify the code prior to merging it.
Sorry, wasn't clear. It's the latter. The test code exists, and has run against my devstack environment with
my
patches checked out, but there's no CI or public posting of test results. Getting CI coverage for these NUMA
things
(like the old Intel one) is a whole other topic.
on the ci front i resolved the nested vert on the server i bought to set up a personal ci for numa testing.
that set me back a few weeks in setting up that ci but i hope to run artom whitebox test amoung other in that at
some
point. vexhost also provided nested virt to the gate vms. im going to see if we can actully create a non voting
job
using the ubuntu-bionic-vexxhost nodeset. if ovh or one of the other providers of ci resource renable nested
virt
then we can maybe make that job voting and not need thridparty ci anymor.
...
> Can we really not even have functional tests with the fake libvirt
> driver and fake numa resources to ensure the flow doesn't blow up?
That's something I have to look into. We have live migration functional tests, and we have NUMA functional
tests,
but
I'm not sure how we can combine the two.
jus as an addtional proof point im am planning to do a bunch of migration and live migration testing in the next
2-4
weeks.
my current backlog on no particalar order is
sriov migration
numa migration
vtpm migration
cross-cell migration
cross-neutron backend migration (ovs<->linuxbridge)
cross-firwall migraton (iptables<->contrack) (previously tested and worked at end of queens)
narrowong in on the numa migration the current set of testcases i plan to manually verify are as follows:
note assume all flavor will have 256mb of ram and 4 cores unless otherwise stated
basic tests
pinned guests (hw:cpu_policy=dedicated)
pinned-isolated guests (hw:cpu_policy=dedicated hw:thread_policy=isolate)
pinned-prefer guests (hw:cpu_policy=dedicated hw:thread_policy=prefer)
unpinned-singel-numa guest (hw:numa_nodes=1)
unpinned-dual-numa guest (hw:numa_nodes=2)
unpinned-dual-numa-unblanced guest (hw:numa_nodes=2 hw:numa_cpu.0=1 hw:numa_cpu.1=1-3
hw:numa_mem.0=64 hw:numa_mem.0=192)
unpinned-hugepage-implcit numa guest (hw:mem_page_size=large)
unpinned-hugepage-multi numa guest (hw:mem_page_size=large hw:numa_nodes=2)
pinned-hugepage-multi numa guest (hw:mem_page_size=large hw:numa_nodes=2 hw:cpu_policy=dedicated)
realtime guest (hw:cpu_policy=dedicated hw:cpu_realtime=yes hw:cpu_realtime_mask=^0-1)
emulator-thread-iosolated guest (hw:cpu_policy=dedicated hw:emulator_threads_policy=isolate)
advanced tests (require extra nova.conf changes)
emulator-thread-shared guest (hw:cpu_policy=dedicated hw:emulator_threads_policy=shared) note cpu_share_set
configrued
unpinned-singel-numa-hetorgious-host guest (hw:numa_nodes=1) note vcpu_pin_set adjusted so that
host 1 only has cpus on
numa 1 and host 2 only has cpus on numa node 2.
supper-optimiesd-guest (hw:numa_nodes=2 hw:numa_cpu.0=1 hw:numa_cpu.1=1-3
hw:numa_mem.0=64 hw:numa_mem.0=192 hw:cpu_realtime=yes hw:cpu_realtime_mask=^0-1
hw:emulator_threads_policy=isolate)
supper-optimiesd-guest-2 (hw:numa_nodes=2 hw:numa_cpu.0=1 hw:numa_cpu.1=1-3 hw:numa_mem.0=64 hw:numa_mem.0=192
hw:cpu_realtime=yes hw:cpu_realtime_mask=^0-1 hw:emulator_threads_policy=share)
for each of these test ill provide a test-command file with the command i used to run the tests and reustlts
file
with a summary at the top plus the xmls before and after the migration showing that intially the resouces
would conflict on migration and then the updated xmls after the migration.
i will also provide the local.conf for the devstack deployment and some details about the env like
distor/qemu/libvirt
versions.
eventurally i hope all those test cases can be added to the whitebox plugin and verifed in a ci.
we could also try and valideate them in functional tests.
i have attached the xml for the pinned guest as an example of what to expect but i will be compileing this
slowly as
i
go and zip everying up in an email to the list.
this will take some time to complete and hosestly i had planned to do most of this testing after feature freeze
when
we
can focus on testing more.
regards
sean