Re: [nova] NUMA live migration - mostly how it's tested

1 Mar 2019

      On Wed, Feb 27, 2019 at 8:25 PM Artom Lifshitz <alifshit@redhat.com> wrote:
...
Hey all,
There won't be much new here for those who've reviewed the patches [1]
already, but I wanted to address the testing situation.
Until recently, the last patch was WIP because I had functional tests
but no unit tests. Even without NUMA anywhere, the claims part of the
new code could be tested in functional tests. With the new and
improved implementation proposed by Dan Smith [2], this is no longer
the case. Any test more involved than unit testing will need "real"
NUMA instances on "real" NUMA hosts to trigger the new code. Because
of that, I've dropped functional testing altogether, have added unit
tests, and have taken the WIP tag off.
Replying to myself here to address the functional tests situation.
I've explored this a bit, and while it's probably doable (it's code,
everything is doable), I'm wondering whether functional tests would be
worth it.

The problem arises from artificially forcing an overlap of the pin
mappings. In my integration tests, using CPU pinning as an example, I
set vcpu_pin_set to 0,1 on both compute hosts, boot two instances
(making sure they're on different hosts by using the
DifferentHostFilter and the appropriate scheduler hint); then change
vcpu_pin_set to 0-3 on host A, live migrate the instance from host B
onto host A, and assert that they don't end up with overlapping pins.

Applying the same strategy to functional tests isn't straightforward
because the CONF object is very very global, and we can't have
different config values for different services in the same test. One
basic functional test we could have is just asserting that the live
migration is refused if both hosts are "full" with instances -
something that currently works just fine, except for resulting in
overlapping pin mappings.

For more advanced testing, I'm proposing that we shelve functional
tests for now and push on setting up some sort of CI job using OpenLab
hardware. I've already opened a request [1]. If this doesn't pan out,
we can revisit what it would take to have functional tests.

Thoughts?

[1] https://github.com/theopenlab/openlab/issues/200
...
What I've been using for testing is this: [3]. It's a series of
patches to whitebox_tempest_plugin, a Tempest plugin used by a bunch
of us Nova Red Hatters to automate testing that's outside of Tempest's
scope. Same idea as the intel-nfv-ci plugin [4]. The tests I currently
have check that:
* CPU pin mapping is updated if the destination has an instance pinned
to the same CPUs as the incoming instance
* emulator thread pins are updated if the destination has a different
cpu_shared_set value and the instance has the
hw:emulator_threads_policy set to `share`
* NUMA node pins are updated for a hugepages instance if the
destination has a hugepages instances consuming the same NUMA node as
the incoming instance
It's not exhaustive by any means, but I've made sure that all
iterations pass those 3 tests. It should be fairly easy to add new
tests, as most of the necessary scaffolding is already in place.
[1] https://review.openstack.org/#/c/634606/
[2] https://review.openstack.org/#/c/634828/28/nova/virt/driver.py@1147
[3] https://review.rdoproject.org/r/#/c/18832/
[4] https://github.com/openstack/intel-nfv-ci-tests/
-- 
--
Artom Lifshitz
Software Engineer, OpenStack Compute DFG