[stein][neutron] gratuitous arp
Ignazio Cassano
ignaziocassano at gmail.com
Wed Apr 29 14:37:55 UTC 2020
PS
I have testing environment on queens,rocky and stein and I can make test
as you need.
Ignazio
Il giorno mer 29 apr 2020 alle ore 16:19 Ignazio Cassano <
ignaziocassano at gmail.com> ha scritto:
> Hello Sean,
> the following is the configuration on my compute nodes:
> [root at podiscsivc-kvm01 network-scripts]# rpm -qa|grep libvirt
> libvirt-daemon-driver-storage-iscsi-4.5.0-33.el7.x86_64
> libvirt-daemon-kvm-4.5.0-33.el7.x86_64
> libvirt-libs-4.5.0-33.el7.x86_64
> libvirt-daemon-driver-network-4.5.0-33.el7.x86_64
> libvirt-daemon-driver-nodedev-4.5.0-33.el7.x86_64
> libvirt-daemon-driver-storage-gluster-4.5.0-33.el7.x86_64
> libvirt-client-4.5.0-33.el7.x86_64
> libvirt-daemon-driver-storage-core-4.5.0-33.el7.x86_64
> libvirt-daemon-driver-storage-logical-4.5.0-33.el7.x86_64
> libvirt-daemon-driver-secret-4.5.0-33.el7.x86_64
> libvirt-daemon-4.5.0-33.el7.x86_64
> libvirt-daemon-driver-nwfilter-4.5.0-33.el7.x86_64
> libvirt-daemon-driver-storage-scsi-4.5.0-33.el7.x86_64
> libvirt-daemon-driver-storage-rbd-4.5.0-33.el7.x86_64
> libvirt-daemon-config-nwfilter-4.5.0-33.el7.x86_64
> libvirt-daemon-driver-storage-disk-4.5.0-33.el7.x86_64
> libvirt-bash-completion-4.5.0-33.el7.x86_64
> libvirt-daemon-driver-qemu-4.5.0-33.el7.x86_64
> libvirt-daemon-driver-storage-4.5.0-33.el7.x86_64
> libvirt-python-4.5.0-1.el7.x86_64
> libvirt-daemon-driver-interface-4.5.0-33.el7.x86_64
> libvirt-daemon-driver-storage-mpath-4.5.0-33.el7.x86_64
> [root at podiscsivc-kvm01 network-scripts]# rpm -qa|grep qemu
> qemu-kvm-common-ev-2.12.0-44.1.el7_8.1.x86_64
> qemu-kvm-ev-2.12.0-44.1.el7_8.1.x86_64
> libvirt-daemon-driver-qemu-4.5.0-33.el7.x86_64
> centos-release-qemu-ev-1.0-4.el7.centos.noarch
> ipxe-roms-qemu-20180825-2.git133f4c.el7.noarch
> qemu-img-ev-2.12.0-44.1.el7_8.1.x86_64
>
>
> As far as firewall driver /etc/neutron/plugins/ml2/openvswitch_agent.ini:
>
> firewall_driver = iptables_hybrid
>
> I have same libvirt/qemu version on queens, on rocky and on stein testing
> environment and the
> same firewall driver.
> Live migration on provider network on queens works fine.
> It does not work fine on rocky and stein (vm lost connection after it is
> migrated and start to respond only when the vm send a network packet , for
> example when chrony pools the time server).
>
> Ignazio
>
>
>
> Il giorno mer 29 apr 2020 alle ore 14:36 Sean Mooney <smooney at redhat.com>
> ha scritto:
>
>> On Wed, 2020-04-29 at 10:39 +0200, Ignazio Cassano wrote:
>> > Hello, some updated about this issue.
>> > I read someone has got same issue as reported here:
>> >
>> > https://bugs.launchpad.net/neutron/+bug/1866139
>> >
>> > If you read the discussion, someone tells that the garp must be sent by
>> > qemu during live miration.
>> > If this is true, this means on rocky/stein the qemu/libvirt are bugged.
>> it is not correct.
>> qemu/libvir thas alsway used RARP which predates GARP to serve as its mac
>> learning frames
>> instead https://en.wikipedia.org/wiki/Reverse_Address_Resolution_Protocol
>> https://lists.gnu.org/archive/html/qemu-devel/2009-10/msg01457.html
>> however it looks like this was broken in 2016 in qemu 2.6.0
>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04645.html
>> but was fixed by
>> https://github.com/qemu/qemu/commit/ca1ee3d6b546e841a1b9db413eb8fa09f13a061b
>> can you confirm you are not using the broken 2.6.0 release and are using
>> 2.7 or newer or 2.4 and older.
>>
>>
>> > So I tried to use stein and rocky with the same version of libvirt/qemu
>> > packages I installed on queens (I updated compute and controllers node
>> on
>> > queens for obtaining same libvirt/qemu version deployed on rocky and
>> stein).
>> >
>> > On queens live migration on provider network continues to work fine.
>> > On rocky and stein not, so I think the issue is related to openstack
>> > components .
>> on queens we have only a singel prot binding and nova blindly assumes
>> that the port binding details wont
>> change when it does a live migration and does not update the xml for the
>> netwrok interfaces.
>>
>> the port binding is updated after the migration is complete in
>> post_livemigration
>> in rocky+ neutron optionally uses the multiple port bindings flow to
>> prebind the port to the destiatnion
>> so it can update the xml if needed and if post copy live migration is
>> enable it will asyconsly activate teh dest port
>> binding before post_livemigration shortenting the downtime.
>>
>> if you are using the iptables firewall os-vif will have precreated the
>> ovs port and intermediate linux bridge before the
>> migration started which will allow neutron to wire it up (put it on the
>> correct vlan and install security groups) before
>> the vm completes the migraton.
>>
>> if you are using the ovs firewall os-vif still precreates teh ovs port
>> but libvirt deletes it and recreats it too.
>> as a result there is a race when using openvswitch firewall that can
>> result in the RARP packets being lost.
>>
>> >
>> > Best Regards
>> > Ignazio Cassano
>> >
>> >
>> >
>> >
>> > Il giorno lun 27 apr 2020 alle ore 19:50 Sean Mooney <
>> smooney at redhat.com>
>> > ha scritto:
>> >
>> > > On Mon, 2020-04-27 at 18:19 +0200, Ignazio Cassano wrote:
>> > > > Hello, I have this problem with rocky or newer with iptables_hybrid
>> > > > firewall.
>> > > > So, can I solve using post copy live migration ???
>> > >
>> > > so this behavior has always been how nova worked but rocky the
>> > >
>> > >
>> https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/neutron-new-port-binding-api.html
>> > > spec intoduced teh ablity to shorten the outage by pre biding the
>> port and
>> > > activating it when
>> > > the vm is resumed on the destiation host before we get to pos live
>> migrate.
>> > >
>> > > this reduces the outage time although i cant be fully elimiated as
>> some
>> > > level of packet loss is
>> > > always expected when you live migrate.
>> > >
>> > > so yes enabliy post copy live migration should help but be aware that
>> if a
>> > > network partion happens
>> > > during a post copy live migration the vm will crash and need to be
>> > > restarted.
>> > > it is generally safe to use and will imporve the migration performace
>> but
>> > > unlike pre copy migration if
>> > > the guess resumes on the dest and the mempry page has not been copied
>> yet
>> > > then it must wait for it to be copied
>> > > and retrive it form the souce host. if the connection too the souce
>> host
>> > > is intrupted then the vm cant
>> > > do that and the migration will fail and the instance will crash. if
>> you
>> > > are using precopy migration
>> > > if there is a network partaion during the migration the migration will
>> > > fail but the instance will continue
>> > > to run on the source host.
>> > >
>> > > so while i would still recommend using it, i it just good to be aware
>> of
>> > > that behavior change.
>> > >
>> > > > Thanks
>> > > > Ignazio
>> > > >
>> > > > Il Lun 27 Apr 2020, 17:57 Sean Mooney <smooney at redhat.com> ha
>> scritto:
>> > > >
>> > > > > On Mon, 2020-04-27 at 17:06 +0200, Ignazio Cassano wrote:
>> > > > > > Hello, I have a problem on stein neutron. When a vm migrate
>> from one
>> > >
>> > > node
>> > > > > > to another I cannot ping it for several minutes. If in the vm I
>> put a
>> > > > > > script that ping the gateway continously, the live migration
>> works
>> > >
>> > > fine
>> > > > >
>> > > > > and
>> > > > > > I can ping it. Why this happens ? I read something about
>> gratuitous
>> > >
>> > > arp.
>> > > > >
>> > > > > qemu does not use gratuitous arp but instead uses an older
>> protocal
>> > >
>> > > called
>> > > > > RARP
>> > > > > to do mac address learning.
>> > > > >
>> > > > > what release of openstack are you using. and are you using
>> iptables
>> > > > > firewall of openvswitch firewall.
>> > > > >
>> > > > > if you are using openvswtich there is is nothing we can do until
>> we
>> > > > > finally delegate vif pluging to os-vif.
>> > > > > currently libvirt handels interface plugging for kernel ovs when
>> using
>> > >
>> > > the
>> > > > > openvswitch firewall driver
>> > > > > https://review.opendev.org/#/c/602432/ would adress that but it
>> and
>> > >
>> > > the
>> > > > > neutron patch are
>> > > > > https://review.opendev.org/#/c/640258 rather out dated. while
>> libvirt
>> > >
>> > > is
>> > > > > pluging the vif there will always be
>> > > > > a race condition where the RARP packets sent by qemu and then mac
>> > >
>> > > learning
>> > > > > packets will be lost.
>> > > > >
>> > > > > if you are using the iptables firewall and you have opnestack
>> rock or
>> > > > > later then if you enable post copy live migration
>> > > > > it should reduce the downtime. in this conficution we do not have
>> the
>> > >
>> > > race
>> > > > > betwen neutron and libvirt so the rarp
>> > > > > packets should not be lost.
>> > > > >
>> > > > >
>> > > > > > Please, help me ?
>> > > > > > Any workaround , please ?
>> > > > > >
>> > > > > > Best Regards
>> > > > > > Ignazio
>> > > > >
>> > > > >
>> > >
>> > >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200429/50ef8e84/attachment-0001.html>
More information about the openstack-discuss
mailing list