[openstack-dev] [all] [nova] [neutron] live migration with multiple port bindings.

Sean Mooney work at seanmooney.info
Mon Aug 20 20:18:59 UTC 2018


HI everyone,

last week i spent some time testing the live migration capabilities
now that nova has started
to use neutron multiple port bindings. all testing unless otherwise
specifed was done
with Rocky RC1 or later commits on Centos 7.5 using devstack.


test summary
~~~~~~~~~~

i have tested the following scenarios with different levels of success.

linux bridge to linux bridge: Worked
ovs iptables to ovs iptables: Worked
ovs conntrack to ovs conntrack: Worked
ovs iptables to ovs conntrack: Worked
ovs conntrack to ovs iptables: Worked
linux bridge to ovs: migrtation succeded network connectivity broken
see bug 1788009
ovs to linux bridge: failed,  libvirt error due to lack of destination
bridge name see bug 1788012
ovs to ovs dpdk: failed qemu bug encountered on migrate. nova xml
generation appears correct.
ovs dpdk to ovs: failed another qemu bug encountered on migrate. nova
xml generation appears correct.
centos->ubuntu: failed emultor not found. see bug: 1788028

not that since iptables to conntrack migration now works operators will be able
to change this value once they have upgraded to rocky via a rolling
update using live migration.

host config
~~~~~~~~


note that not all nodes were running the exact same commits as i added
addtional nodes later in my testing.
all nodes were at least at this level
nova sha: afe4512bf66c89a061b1a7ccd3e7ac8e3b1b284d
neutron sha: 1dda2bca862b1268c0f5ae39b7508f1b1cab6f15

nova was configured with
[compute]
live_migration_wait_for_vif_plug = True

and the nova commit above contains the revirt of the slow migration change.


test details
~~~~~~~~

in both the ovs-dpdk tests, when the migration failed and the vm
contiuned to run on the source node however
it had no network connectivity. on hard reboot of the vm, it went to
error state because the vif binding
was set to none as the vif:bidning-details:host_id  was set to none so
the vif_type was also set to none.
i have opened a nova bug to track the fact that the vm is left in an
invalid state even though the status is active.
see bug 1788014

when i was testing live migration betweeen ovs with iptable and the
connection tracking firewall i
also did minimal testing to ensure the firewall work. i did this by
booting 3 vm.

2 VM A and B in the same security group and one in a seperate security
group VM C.

VM A and B where intially on differnet node ovs compute nodes with VM
A using iptables
and VM B using conntrack security group driver.

VM C was on the conntrac node.

before VM c was setup to ping vm B which is block by security groups
VM A was also configured to ping VM B which is allowed by security groups.

VM B was then live migrate from the conntrack node to the iptables
node and back while
observing the ping out put of VM A and C

druing this process it was observed that VM A contiued to ping VM B succesfully
and at no point was VM C able to ping VM B.

while this is by no means a complete test it indicates that security
groups appear to be configred before network conenctive
is restored on live migrating as expected.

i also noticed that the interval where network connectivity was lost
during live migrate was longer when going between
the ip table node to the conntrack node the  the reverse. i did not
investage why but i suspect this is related to some flow timeouts
in the contrack module.


other testing
~~~~~~~~~

about two week ago i also tested the numa aware vswitch sepc.
dureing that testing i confiimed that new isntaces were numa affined corectly
i also confirmed that while live migration succeded the numa pinnning
was not updated.
as this was expected i have not opened a bug for this since it will be
addressed in Stein by
the numa aware live migration sepc.


future testing
~~~~~~~~~

OVS-DPDK to OVS-DPDK
===================
if i have time i will try and test live migration betwen two ovs-dpdk host.
this has worked since before nova supported vhost-user. i did not
test this case yet but its possible the qemu bug i hit in my
ovs to ovs-dpdk testing could also break ovs-dpdk to ovs-dpdk migration.

ovs to ovn
========
if i have time i may also test ovs to ovn migration.
this should just work but i  suspect that the same bug i hit with mixed
ovs and linux bridge clouds may exist and the vxlan tunnels mesh may
not be created.


BUGS
_____

nova
~~~~

when live migration fails due to a internal error rollback is not
handeled correctly. :- https://bugs.launchpad.net/nova/+bug/1788014
libvirt: nova assumed dest emultor path is the same as source and
fails to migrate if this is not true. :-
https://bugs.launchpad.net/nova/+bug/1788028


neutron
~~~~~

neutron bridge name is not always set for ml2/ovs: -
https://bugs.launchpad.net/neutron/+bug/1788009
bridge name not set in vif:binding-details by ml2/linux-bridge: -
https://bugs.launchpad.net/neutron/+bug/1788012
neutron does not form mesh tunnel overly between different ml2
driver.: - https://bugs.launchpad.net/neutron/+bug/1788023


regards
sean



More information about the OpenStack-dev mailing list