On Mon, Feb 9, 2026 at 3:19 PM Sean Mooney <smooney@redhat.com> wrote:


On 09/02/2026 14:53, Nell Jerram wrote:
> On Mon, Feb 9, 2026 at 11:25 AM Sean Mooney <smooney@redhat.com> wrote:
>
>
>
>     On 05/02/2026 19:59, Zhan Zhang wrote:
>     > On the Nova side, it actually does what libvirt does when the
>     tap interface is created [2] (i.e., setting the mac address +
>     bringing the interface up). However, the mac address that the
>     function gets is from the vif's port binding's "mac_address" field
>     [3], which is hard-coded when using networking-calico - the
>     Neutron Calico driver, and this is wrong.
>     so normally the mac is taken directly form the neuton port rather
>     then
>     the detial sub filed in the neutron port.
>     so yes this does look incorrect in that regargd however this
>     deviation
>     was called out in the orgihnal commit
>     https://opendev.org/openstack/nova/commit/e0bca279d53f866d17834cdee025cda8190bdc14
>
>     """
>     VIF_TYPE_TAP supports a 'mac_address' key, in the VIF details dict,
>     which allows Neutron to specify the MAC address for the host end of
>     the TAP interface.
>
>     """
>
>     so it looks like the calico driver was written to intentially have
>     the
>     ablity to have a different back for the host side
>     of the tap device that is not the same as the guest side.
>
>     https://github.com/projectcalico/calico/blob/v3.31.3/networking-calico/networking_calico/plugins/ml2/drivers/calico/mech_calico.py#L219
>     however does indeed show that it was hard coded on the
>     networkign-calico side
>
>
> Indeed.  I can't remember why we made those choices 12 years ago, but
> yes, the effect is that we have a hardcoded MAC on the host side of
> every guest interface, and that it's the same in every case.
>
> What exactly do you see as being wrong about that, Sean?
well for one you should generally not have 2 netdevs with the same mac
in a given network namespace
i know that you can do that but  it breaks l2 routing.

claico is a l3 network stack so it may not care but its generally bad
practice to do.

in any case form a nova perspective the mac adress that we use for the
vm should be specified by neutron
for ovs we are usign a single mac (the neutron port mac) for the
interface added to the vm and the tap on the host added to ovs.

for calico we are using the port mac for the virtio-net interface
precedented to the guest and the hard coded mac for the tap interface on
the host
and nova is just using the value generated by the calico mech driver.

so if this is wrogn we shoudl fix the mech driver.

if there is no reason for them to be different we could consider
remvoing the supprot for that in nova but that may have upgrade impacts
it woudl be better to update the mech driver to have it set the same
value in both places first and then update nova in a later release.


>
>     so this seams to have only been workign becasue libvirt was
>     overriaded the mac.
>
>
> Can you expand on that?  What does "only been working" mean, and what
> do you mean by libvirt overriding the MAC?
zhang asserted that "libvirt will set the mac address of the tap
interface with `fe:xx:xx:xx:xx:xx` on the hypervisor side and
`fa:xx:xx:xx:xx:xx` on the VM side"
if that is the desired bevhior then we need to update the mech driver
for calico to generate those two macs.
nova should not have  logic to do that transformation form fe to fa


>     >   So what happens before (with `managed=yes`) is that even
>     though the mac address is wrongly updated, libvirt will rewrite
>     the mac address later when Nova calls the migration API, so we
>     won’t hit this issue. What happens now when we migrate the VM with
>     the updated `managed=no` is that:
>
>
> Can you expand on "before (with `managed=yes`)" ?  My understanding is
> that VIF_TYPE_TAP devices have never behaved successfully with
> `managed=yes`.
this was imprecise nova did not set managed to any value historically.
prior to https://github.com/libvirt/libvirt/commit/a2ae3d299cf
the tap was allowed to already exist and the default in libvirt was
manged=yes i belive
nova was not settign it to yes or not

https://review.opendev.org/c/openstack/nova/+/960284 updated nova to set
ti to no.

to allow calico to work again.
my understading form this thread is with older libvirt and the conig we
specifed the mac of the tap was modifed V

>
>     > 1. When the VM is created first without `managed=no`, libvirt
>     will set the mac address of the tap interface with
>     `fe:xx:xx:xx:xx:xx` on the hypervisor side and `fa:xx:xx:xx:xx:xx`
>     on the VM side. VM learns this, and it's pingable.
>
so if ^ is ture we shoudl restore that behviro but the crrect way to do
tha tis to modify the mac defiend in the vif['details'] field to have
the correct mac for the tap.

im not expressing any opion on what that mac value should be just that
it is the responiblity fo the mech driver to set ti to a value that will
work for both new boots and for live migration.

>     > 2. When the VM is migrated with the updated XML (including
>     `managed=no`), libvirt will NOT overwrite the mac address for the
>     tap interface. Thus, when Nova creates the TAP interface and sets
>     the mac address of the tap interface on the hypervisor side with
>     the hard-coded mac address (i.e., `00:61:fe:ed:ca:fe`), this
>     change will be persisted after the live migration.
>     > 3. When the VM is resumed, it doesn't know that the hypervisor
>     side interface mac address has changed, and is still sending
>     packets with the old mac address (i.e., `fe:xx:xx:xx:xx:xx`).
>     Hypervisor sees that there is no matching mac address, and will
>     drop the packet. Running `tcpdump`, we were able to see that the
>     VM is answering ping packets, but the replies never go out of the
>     hypervisor.
>     > 4. At some point later, VM will send ARP to ask for the new mac
>     address, and it will become pingable when it gets the answer.
>     >
>     > So the key point here is to make sure that the mac addresses on
>     both the VM and the hypervisor side are still the same before &
>     after live migration. This is also why we don't see a latency
>     increase when migrating with the `managed=no` flag present in both
>     the XML before & after - the mac addresses are the same.
>     >
>     > I came up with a small patch in Nova to “fix” this by reading
>     the port's mac address and do the reverse of what libvirt is doing
>     (i.e., I know the port’s mac address is `fa:xx:xx:xx:xx:xx`, I set
>     the tap interface's mac address to be `fe:xx:xx:xx:xx:xx` when
>     creating the tap interface [3]).
>     well the actuall bug seams to be in networking caliclo
>     its pretty clear from the nova code that the intent was for the mech
>     driver to calulate and provide a   diffent mac for the host but it
>     was
>     never implemented
>     on the neutron side.
>
>     >   But before I file a bug report, Sean, I would like to understand:
>     >
>     > 1. Given now we assume `managed=no`, should Nova take the
>     responsibility of setting the mac address correctly?
>     short term yes but the code you provided shows that it already is
>
>     https://opendev.org/openstack/nova/src/branch/stable/2025.2/nova/virt/libvirt/vif.py#L690-L705
>
>
>     plug tap is using the value for the vif details field when
>     creating the tap.
>
>     medium term |plug_tap is a legacy fucntion and this should be move
>     to os-vif
>     that was inteded to be doen many many years ago but no one got
>     around to it.
>     so now would be a |
>     > 2. If so, we will need to re-evaluate all things that were done
>     by libvirt with `managed=yes` before and make sure Nova will do
>     them, since we changed the default to `managed=no` now.
>     to be clear libivrt defautl in the past as also effecitlgy
>     manged=no n
>     that it previoslyu did nto delete the tap it may have modifed it (its
>     mac) but they change the behvior
>     changing the meaning of the xml we generated.
>
>     we now use manged=no so all tap creation and configurtion need to
>     happen
>     in nova or idally in os-vif
>     i recnetly added tap creation logic to os-vif for ovs
>     https://github.com/openstack/os-vif/commit/eba8007607381736b23e0a0ac672981e726fd8ee
>     ideally next cycle we woudl add a vif_plug_tap plugin for all
>     backened
>     that use vif_type=tap which is calico and in teh past medionet but
>     that
>     nolonger maintied
>
>
>     > 3. Should Nova read from `vif['address']` for mac address,
>     instead of
>     `vif['details'].get(network_model.VIF_DETAILS_TAP_MAC_ADDRESS)`?
>     not form looking at the orginal commit
>     the nova code was intentioally writen this way so that
>     networking-calico
>     coudl specify the host side of the tap to have a diffeent mac
>     i don't know why calico requires that but the orgianly code
>     belived it
>     was requried.
>
>     networking-calico is not allowed to assuem which nova hypervior is in
>     used so it cannot deplend on the libvirt semantics
>
>     "libvirt will set the mac address of the tap interface with
>     `fe:xx:xx:xx:xx:xx` on the hypervisor side and `fa:xx:xx:xx:xx:xx` on
>     the VM side."
>
>     if that is the desired behavior then it need to specify that by
>     setting
>     `fe:xx:xx:xx:xx:xx` in the the vif details.
>
>     vif['address'] is the mac that shoudl be visable to the guest not
>     the mac of the port on the host.
>
>     >
>     > And for Nell:
>     >
>     > 1. Should networking-calico be modified to have
>     `vif['details'].get(network_model.VIF_DETAILS_TAP_MAC_ADDRESS)`
>     field to reflect the actual mac address of the port?
>     it shoudl be the mac of the host tap vif['address']is the mac of the
>     guest interface and neutron port.
>
>
> I'm happy to look at changes here, but at the moment I'm afraid I'm
> still at the stage of trying to understand the current analysis.
yep i think thats where most of us are.
trying to understand what the intened behvior is, what it was in the
past and how to align the two going forward.
as it stands i do not see a bug on the nova side but the fact the mac
apprease to be hardcoded on the networking-calico side for the host
tap does look like a bug.

unfortunately i dont see anything that indicate why it was hard coded.
>
> Best wishes - Nell
>
>     >
>     > Thank you for helping out!
>     >
>     > Best regards,
>     > Zhan Zhang
>     >
>     > [0]:
>     https://gitlab.com/libvirt/libvirt/-/blob/v10.0.0/src/qemu/qemu_interface.c#L482-503
>     > [1]: https://github.com/libvirt/libvirt/commit/a2ae3d299cf
>     > [2]:
>     https://opendev.org/openstack/nova/src/branch/stable/2025.2/nova/privsep/linux_net.py#L128-L130
>     > [3]:
>     https://opendev.org/openstack/nova/src/branch/stable/2025.2/nova/virt/libvirt/vif.py#L690-L705
>     > [4]:
>     https://github.com/projectcalico/calico/blob/v3.31.3/networking-calico/networking_calico/plugins/ml2/drivers/calico/mech_calico.py#L219
>     >
>


I found the commit that introduced it, in https://github.com/projectcalico/calico.  But sadly no useful explanation - bad me.

```
ed8ad15d8f69650d38c759f054328ccd263f62a4
Author:     Neil Jerram <Neil.Jerram@metaswitch.com>
AuthorDate: Fri May 22 23:01:15 2015 +0100
Commit:     Cory Benfield <cory.benfield@metaswitch.com>
CommitDate: Wed May 27 10:06:09 2015 +0100

Parent:     62536ca420 Version 0.21
Contained:  auto-pick-of-#11767-origin-release-v3.30
            auto-pick-of-#11767-origin-release-v3.31
            bgpfilter-enhancements calicoctl-st-update e2e-test-moves
            get-all-hang hep-profile-label live-migration-monitor
            live-migration-poc modern
            openstack-v3.30-pre-release-packaging
            routing-priority-config test-arp-ignore
Follows:    0.21-felix (1)
Precedes:   0.22-felix (65)

Neutron driver: specify fixed MAC address for Calico TAP interfaces

1 file changed, 2 insertions(+), 1 deletion(-)
calico/openstack/mech_calico.py | 3 ++-

modified   calico/openstack/mech_calico.py
@@ -123,7 +123,8 @@ class CalicoMechanismDriver(mech_agent.SimpleAgentMechanismDriverBase):
         super(CalicoMechanismDriver, self).__init__(
             constants.AGENT_TYPE_DHCP,
             'tap',
-            {'port_filter': True})
+            {'port_filter': True,
+             'mac_address': '00:61:fe:ed:ca:fe'})
 
         # Initialize fields for the database object and transport.  We will
         # initialize these properly when we first need them.
```

So, I could run experiments to revert that, and patch Nova to use `vif['address']`, and see what breaks, if anything.  But - stepping back - what is the problem we are trying to address here?  It looks like there were some emails from Zhan that didn't go to the list, so I'm not sure I have the complete picture.

Best wishes - Nell