Hey,

What you are describing sounds to me like what we found and reported with [1] couple of weeks ago.

So if I am not wrong with my assumption, you may wanna join the bug report investigation:)

[1] https://bugs.launchpad.net/neutron/+bug/2077879


On Thu, Sep 5, 2024, 18:32 Sesterhenn, Maximilian <Maximilian.Sesterhenn@epg.com> wrote:
Hey all,

I am currently facing an issue with OpenStack Neutron and OVN and could need some help here.
Maybe someone else is doing something similar and can report if it works for them.

TLDR: 
FIPs are broken for us when used together with routers which use a tunneled network as router gateway network.
ovn_router_indirect_snat is not used and both OVN 23.09 and 24.03 have been tested.
FIPs work fine when used with an external provider network.


We have deployed three OpenStack projects, let's just call the left, middle and right for simplicity.

The middle project has a network with a large subnet.
No special configuration here, just a big subnet which is transported by OVN through Geneve.

The idea is to use this subnet as an internal transfer network between different projects / tenants, which means we connect routers with SNAT and FIP configuration.
Left and right are potentially the same subnet, that's why we cannot simply route the traffic.
The network is shared with "access_as_external/shared" RBAC rules with both the left and right tenant, so it can be selected as the external gateway network for routers.

I'm aware that until very recently, neutron was unable to assign a chassis to routers which use gateway networks which are tunneled.
Thats why we are currently running the latest neutron code from the master branch where this feature was added.

Let's say we want to connect from left to right and we have a router in each project that has this large network from the middle project as the external gateway network.
An instance in the right tenant has a FIP to be reachable.
Another instance in the left tenant is masqueraded by the SNAT functionality of the router.
The router in the left project performs SNAT and the router in the right project performs the FIP DNAT.

While the instance from the left project can ping the internal and external interfaces of its local router, the FIP is not reachable.
The external interface of the remote router (from the right project) can be pinged as well.

When capturing packets on the underlying hosts we can see the packet going from the compute host of the left instance to one of the network nodes.
SNAT happens there, DNAT is not visible, this could be due to the packet capture scenario.
However, we see ARP requests coming from the external interface of the router on the right project for the FIP address.
We see the same ARP requests on a second network node.
To me it would make more sense if the router from the left project would ARP for the FIP address.
This ARP request is not being answered.

We're using enable_distributed_floating_ip / DVR, so the FIP is actually configured onto the compute node.
We're not using ovn_router_indirect_snat, here are issues with that.
We even tried that; it makes no difference.



OVN 23.09 (tested 24.03 as well)
OpenStack 2024.1 (neutron master) deployed using kolla-ansible.


BR
Maximilian