Hi all, There was something wrong with the live migration when using 'dedicated' cpu_policy in my test. Attached file contains the details. On 2019/2/28 下午9:28, Sean Mooney wrote:
On Wed, 2019-02-27 at 21:33 -0500, Artom Lifshitz wrote:
On Wed, Feb 27, 2019, 21:27 Matt Riedemann, <mriedemos@gmail.com> wrote:
On 2/27/2019 7:25 PM, Artom Lifshitz wrote:
What I've been using for testing is this: [3]. It's a series of patches to whitebox_tempest_plugin, a Tempest plugin used by a bunch of us Nova Red Hatters to automate testing that's outside of Tempest's scope.
And where is that pulling in your nova series of changes and posting test results (like a 3rd party CI) so anyone can see it? Or do you mean here are tests, but you need to provide your own environment if you want to verify the code prior to merging it.
Sorry, wasn't clear. It's the latter. The test code exists, and has run against my devstack environment with my patches checked out, but there's no CI or public posting of test results. Getting CI coverage for these NUMA things (like the old Intel one) is a whole other topic.
on the ci front i resolved the nested vert on the server i bought to set up a personal ci for numa testing. that set me back a few weeks in setting up that ci but i hope to run artom whitebox test amoung other in that at some point. vexhost also provided nested virt to the gate vms. im going to see if we can actully create a non voting job using the ubuntu-bionic-vexxhost nodeset. if ovh or one of the other providers of ci resource renable nested virt then we can maybe make that job voting and not need thridparty ci anymor.
Can we really not even have functional tests with the fake libvirt driver and fake numa resources to ensure the flow doesn't blow up?
That's something I have to look into. We have live migration functional tests, and we have NUMA functional tests, but I'm not sure how we can combine the two.
jus as an addtional proof point im am planning to do a bunch of migration and live migration testing in the next 2-4 weeks.
my current backlog on no particalar order is sriov migration numa migration vtpm migration cross-cell migration cross-neutron backend migration (ovs<->linuxbridge) cross-firwall migraton (iptables<->contrack) (previously tested and worked at end of queens)
narrowong in on the numa migration the current set of testcases i plan to manually verify are as follows:
note assume all flavor will have 256mb of ram and 4 cores unless otherwise stated
basic tests pinned guests (hw:cpu_policy=dedicated) pinned-isolated guests (hw:cpu_policy=dedicated hw:thread_policy=isolate) pinned-prefer guests (hw:cpu_policy=dedicated hw:thread_policy=prefer) unpinned-singel-numa guest (hw:numa_nodes=1) unpinned-dual-numa guest (hw:numa_nodes=2) unpinned-dual-numa-unblanced guest (hw:numa_nodes=2 hw:numa_cpu.0=1 hw:numa_cpu.1=1-3 hw:numa_mem.0=64 hw:numa_mem.0=192) unpinned-hugepage-implcit numa guest (hw:mem_page_size=large) unpinned-hugepage-multi numa guest (hw:mem_page_size=large hw:numa_nodes=2) pinned-hugepage-multi numa guest (hw:mem_page_size=large hw:numa_nodes=2 hw:cpu_policy=dedicated) realtime guest (hw:cpu_policy=dedicated hw:cpu_realtime=yes hw:cpu_realtime_mask=^0-1) emulator-thread-iosolated guest (hw:cpu_policy=dedicated hw:emulator_threads_policy=isolate)
advanced tests (require extra nova.conf changes) emulator-thread-shared guest (hw:cpu_policy=dedicated hw:emulator_threads_policy=shared) note cpu_share_set configrued unpinned-singel-numa-hetorgious-host guest (hw:numa_nodes=1) note vcpu_pin_set adjusted so that host 1 only has cpus on numa 1 and host 2 only has cpus on numa node 2. supper-optimiesd-guest (hw:numa_nodes=2 hw:numa_cpu.0=1 hw:numa_cpu.1=1-3 hw:numa_mem.0=64 hw:numa_mem.0=192 hw:cpu_realtime=yes hw:cpu_realtime_mask=^0-1 hw:emulator_threads_policy=isolate) supper-optimiesd-guest-2 (hw:numa_nodes=2 hw:numa_cpu.0=1 hw:numa_cpu.1=1-3 hw:numa_mem.0=64 hw:numa_mem.0=192 hw:cpu_realtime=yes hw:cpu_realtime_mask=^0-1 hw:emulator_threads_policy=share)
for each of these test ill provide a test-command file with the command i used to run the tests and reustlts file with a summary at the top plus the xmls before and after the migration showing that intially the resouces would conflict on migration and then the updated xmls after the migration. i will also provide the local.conf for the devstack deployment and some details about the env like distor/qemu/libvirt versions.
eventurally i hope all those test cases can be added to the whitebox plugin and verifed in a ci. we could also try and valideate them in functional tests.
i have attached the xml for the pinned guest as an example of what to expect but i will be compileing this slowly as i go and zip everying up in an email to the list. this will take some time to complete and hosestly i had planned to do most of this testing after feature freeze when we can focus on testing more.
regards sean