<div dir="ltr"><div>Hello, some updated about this issue.</div><div>I read someone has got same issue as reported here:</div><div><br></div><div><a href="https://bugs.launchpad.net/neutron/+bug/1866139">https://bugs.launchpad.net/neutron/+bug/1866139</a></div><div><br></div><div>If you read the discussion, someone tells that the garp must be sent by qemu during live miration.</div><div>If this is true, this means on rocky/stein the qemu/libvirt are bugged.</div><div>So I tried to use stein and rocky with the same version of libvirt/qemu packages I installed on queens (I updated compute and controllers node on queens for obtaining same libvirt/qemu version deployed on rocky and stein).</div><div><br></div><div>On queens live migration on provider network continues to work fine.</div><div>On rocky and stein not, so I think the issue is related to openstack components .<br></div><div><br></div><div>Best Regards<br></div><div>Ignazio Cassano<br></div><div><br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Il giorno lun 27 apr 2020 alle ore 19:50 Sean Mooney <<a href="mailto:smooney@redhat.com">smooney@redhat.com</a>> ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, 2020-04-27 at 18:19 +0200, Ignazio Cassano wrote:<br>
> Hello, I have this problem with rocky or newer with iptables_hybrid<br>
> firewall.<br>
> So, can I solve using post copy live migration ???<br>
so this behavior has always been how nova worked but rocky the <br>
<a href="https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/neutron-new-port-binding-api.html" rel="noreferrer" target="_blank">https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/neutron-new-port-binding-api.html</a><br>
spec intoduced teh ablity to shorten the outage by pre biding the port and activating it when<br>
the vm is resumed on the destiation host before we get to pos live migrate.<br>
<br>
this reduces the outage time although i cant be fully elimiated as some level of packet loss is<br>
always expected when you live migrate.<br>
<br>
so yes enabliy post copy live migration should help but be aware that if a network partion happens<br>
during a post copy live migration the vm will crash and need to be restarted.<br>
it is generally safe to use and will imporve the migration performace but unlike pre copy migration if <br>
the guess resumes on the dest and the mempry page has not been copied yet then it must wait for it to be copied<br>
and retrive it form the souce host. if the connection too the souce host is intrupted then the vm cant<br>
do that and the migration will fail and the instance will crash. if you are using precopy migration<br>
if there is a network partaion during the migration the migration will fail but the instance will continue<br>
to run on the source host.<br>
<br>
so while i would still recommend using it, i it just good to be aware of that behavior change.<br>
<br>
> Thanks<br>
> Ignazio<br>
> <br>
> Il Lun 27 Apr 2020, 17:57 Sean Mooney <<a href="mailto:smooney@redhat.com" target="_blank">smooney@redhat.com</a>> ha scritto:<br>
> <br>
> > On Mon, 2020-04-27 at 17:06 +0200, Ignazio Cassano wrote:<br>
> > > Hello, I have a problem on stein neutron. When a vm migrate from one node<br>
> > > to another I cannot ping it for several minutes. If in the vm I put a<br>
> > > script that ping the gateway continously, the live migration works fine<br>
> > <br>
> > and<br>
> > > I can ping it. Why this happens ? I read something about gratuitous arp.<br>
> > <br>
> > qemu does not use gratuitous arp but instead uses an older protocal called<br>
> > RARP<br>
> > to do mac address learning.<br>
> > <br>
> > what release of openstack are you using. and are you using iptables<br>
> > firewall of openvswitch firewall.<br>
> > <br>
> > if you are using openvswtich there is is nothing we can do until we<br>
> > finally delegate vif pluging to os-vif.<br>
> > currently libvirt handels interface plugging for kernel ovs when using the<br>
> > openvswitch firewall driver<br>
> > <a href="https://review.opendev.org/#/c/602432/" rel="noreferrer" target="_blank">https://review.opendev.org/#/c/602432/</a> would adress that but it and the<br>
> > neutron patch are<br>
> > <a href="https://review.opendev.org/#/c/640258" rel="noreferrer" target="_blank">https://review.opendev.org/#/c/640258</a> rather out dated. while libvirt is<br>
> > pluging the vif there will always be<br>
> > a race condition where the RARP packets sent by qemu and then mac learning<br>
> > packets will be lost.<br>
> > <br>
> > if you are using the iptables firewall and you have opnestack rock or<br>
> > later then if you enable post copy live migration<br>
> > it should reduce the downtime. in this conficution we do not have the race<br>
> > betwen neutron and libvirt so the rarp<br>
> > packets should not be lost.<br>
> > <br>
> > <br>
> > > Please, help me ?<br>
> > > Any workaround , please ?<br>
> > > <br>
> > > Best Regards<br>
> > > Ignazio<br>
> > <br>
> > <br>
<br>
</blockquote></div>