Hi Nell & Sean,
This is Zhan, and I'm taking a look at the issue with Chang. I’m providing an update with our findings here, but first:
For live migration from Jammy to Noble, Sean has covered that that isn't officially supported at the libvirt level. I am a bit late to the party and correct me if I’m wrong: I think Sean mentioned that only Jammy to Noble is supported, not the other way around, as it will make it impossible to upgrade?
For live migration from {Jammy without patch} to {Jammy with patch}, does the added latency correlate reliably with the patch being present on the destination node? Let's take a step back and don’t worry about the patches. The additional latency will always exist when the domain's XML is updated during migrations, from without `managed=no` to with `managed=no`. If we don't update the XML during migrations (i.e., the XML before & after migration both don’t have `managed=no`, or the XML before & after migration both have `managed=no`), there is no additional latency.
the lattency is a result of calico. calico does not support wiring up the the destination for networking until after the port binding is activated which alwasy happens after the vm is running I want to clarify here that by the increase in latency, we mean the normal latency is something like ~4s, and the increase in latency is something like ~30s. We are fully aware that Calico needs improvements on this, hence I submitted the Spec for refining live migration network update process in 2026.2, and as Nell mentioned that they are working on improving this too. Imo, we are approaching this problem from two angles :D. We do believe that the increase in latency (from ~4s to ~30s) is related to the `managed=no` patch. Please find our findings below.
but I don't think any of that relates to the Nova patch for libvirt. After our investigation, this is actually related. Please take a look at libvirt's code when `managed=yes` [0] (which is the default). Without the patch introduced in libvirt v9.5.0 [1], `virNetDevTapCreate` will not error out when the tap interface already exists and `managed=yes`, and the function afterwards will still be executed (i.e., setting the mac address for the interfaces on hyp & VM, and bringing the device online). By specifying `managed=no`, libvirt is no longer doing this.
On the Nova side, it actually does what libvirt does when the tap interface is created [2] (i.e., setting the mac address + bringing the interface up). However, the mac address that the function gets is from the vif's port binding's "mac_address" field [3], which is hard-coded when using networking-calico - the Neutron Calico driver, and this is wrong. so normally the mac is taken directly form the neuton port rather then
On 05/02/2026 19:59, Zhan Zhang wrote: the detial sub filed in the neutron port. so yes this does look incorrect in that regargd however this deviation was called out in the orgihnal commit https://opendev.org/openstack/nova/commit/e0bca279d53f866d17834cdee025cda819... """ VIF_TYPE_TAP supports a 'mac_address' key, in the VIF details dict, which allows Neutron to specify the MAC address for the host end of the TAP interface. """ so it looks like the calico driver was written to intentially have the ablity to have a different back for the host side of the tap device that is not the same as the guest side. https://github.com/projectcalico/calico/blob/v3.31.3/networking-calico/netwo... however does indeed show that it was hard coded on the networkign-calico side so this seams to have only been workign becasue libvirt was overriaded the mac.
So what happens before (with `managed=yes`) is that even though the mac address is wrongly updated, libvirt will rewrite the mac address later when Nova calls the migration API, so we won’t hit this issue. What happens now when we migrate the VM with the updated `managed=no` is that:
1. When the VM is created first without `managed=no`, libvirt will set the mac address of the tap interface with `fe:xx:xx:xx:xx:xx` on the hypervisor side and `fa:xx:xx:xx:xx:xx` on the VM side. VM learns this, and it's pingable. 2. When the VM is migrated with the updated XML (including `managed=no`), libvirt will NOT overwrite the mac address for the tap interface. Thus, when Nova creates the TAP interface and sets the mac address of the tap interface on the hypervisor side with the hard-coded mac address (i.e., `00:61:fe:ed:ca:fe`), this change will be persisted after the live migration. 3. When the VM is resumed, it doesn't know that the hypervisor side interface mac address has changed, and is still sending packets with the old mac address (i.e., `fe:xx:xx:xx:xx:xx`). Hypervisor sees that there is no matching mac address, and will drop the packet. Running `tcpdump`, we were able to see that the VM is answering ping packets, but the replies never go out of the hypervisor. 4. At some point later, VM will send ARP to ask for the new mac address, and it will become pingable when it gets the answer.
So the key point here is to make sure that the mac addresses on both the VM and the hypervisor side are still the same before & after live migration. This is also why we don't see a latency increase when migrating with the `managed=no` flag present in both the XML before & after - the mac addresses are the same.
I came up with a small patch in Nova to “fix” this by reading the port's mac address and do the reverse of what libvirt is doing (i.e., I know the port’s mac address is `fa:xx:xx:xx:xx:xx`, I set the tap interface's mac address to be `fe:xx:xx:xx:xx:xx` when creating the tap interface [3]). well the actuall bug seams to be in networking caliclo its pretty clear from the nova code that the intent was for the mech driver to calulate and provide a diffent mac for the host but it was never implemented on the neutron side.
But before I file a bug report, Sean, I would like to understand:
1. Given now we assume `managed=no`, should Nova take the responsibility of setting the mac address correctly? short term yes but the code you provided shows that it already is
2. If so, we will need to re-evaluate all things that were done by libvirt with `managed=yes` before and make sure Nova will do them, since we changed the default to `managed=no` now. to be clear libivrt defautl in the past as also effecitlgy manged=no n
3. Should Nova read from `vif['address']` for mac address, instead of `vif['details'].get(network_model.VIF_DETAILS_TAP_MAC_ADDRESS)`? not form looking at the orginal commit
https://opendev.org/openstack/nova/src/branch/stable/2025.2/nova/virt/libvir... plug tap is using the value for the vif details field when creating the tap. medium term |plug_tap is a legacy fucntion and this should be move to os-vif that was inteded to be doen many many years ago but no one got around to it. so now would be a | that it previoslyu did nto delete the tap it may have modifed it (its mac) but they change the behvior changing the meaning of the xml we generated. we now use manged=no so all tap creation and configurtion need to happen in nova or idally in os-vif i recnetly added tap creation logic to os-vif for ovs https://github.com/openstack/os-vif/commit/eba8007607381736b23e0a0ac672981e7... ideally next cycle we woudl add a vif_plug_tap plugin for all backened that use vif_type=tap which is calico and in teh past medionet but that nolonger maintied the nova code was intentioally writen this way so that networking-calico coudl specify the host side of the tap to have a diffeent mac i don't know why calico requires that but the orgianly code belived it was requried. networking-calico is not allowed to assuem which nova hypervior is in used so it cannot deplend on the libvirt semantics "libvirt will set the mac address of the tap interface with `fe:xx:xx:xx:xx:xx` on the hypervisor side and `fa:xx:xx:xx:xx:xx` on the VM side." if that is the desired behavior then it need to specify that by setting `fe:xx:xx:xx:xx:xx` in the the vif details. vif['address'] is the mac that shoudl be visable to the guest not the mac of the port on the host.
And for Nell:
1. Should networking-calico be modified to have `vif['details'].get(network_model.VIF_DETAILS_TAP_MAC_ADDRESS)` field to reflect the actual mac address of the port?
it shoudl be the mac of the host tap vif['address']is the mac of the guest interface and neutron port.
Thank you for helping out!
Best regards, Zhan Zhang
[0]: https://gitlab.com/libvirt/libvirt/-/blob/v10.0.0/src/qemu/qemu_interface.c#... [1]: https://github.com/libvirt/libvirt/commit/a2ae3d299cf [2]: https://opendev.org/openstack/nova/src/branch/stable/2025.2/nova/privsep/lin... [3]: https://opendev.org/openstack/nova/src/branch/stable/2025.2/nova/virt/libvir... [4]: https://github.com/projectcalico/calico/blob/v3.31.3/networking-calico/netwo...