Hi Gary,

On top what Rodolfo said
On Tue, Jun 27, 2023 at 5:15 PM Gary Molenkamp <molenkam@uwo.ca> wrote:
Good morning,   I'm having a problem with snat routing under OVN but I'm
not sure if something is mis-configured or just my understanding of how
OVN is architected is wrong.

I've built a Zed cloud, since upgraded to Antelope, using the Neutron
Manual install method here:
https://docs.openstack.org/neutron/latest/install/ovn/manual_install.html
I'm using a multi-tenent configuration using geneve and the flat
provider network is present on each hypervisor. Each hypervisor is
connected to the physical provider network, along with the tenent
network and is tagged as an external chassis under OVN.
         br-int exists, as does br-provider
         ovs-vsctl set open .
external-ids:ovn-cms-options=enable-chassis-as-gw

Any specific reason to enable gateway on compute nodes? Generally it's recommended to use controller/network nodes as gateway. What's your env(number of controllers, network, compute nodes)?


For most cases, distributed FIP based connectivity is working without
issue, but I'm having an issue where VMs without a FIP are not always
able to use the SNAT services of the tenent network router.
Scenario:
     Internal network named cs3319:  with subnet 172.31.100.0/23
     Has a router named cs3319_router with external gateway set (snat
enabled)

     This network has 3 vms:
         - #1 has a FIP and can be accessed externally
         - #2 has no FIP, can be accessed via VM1 and can access
external resources via SNAT  (ie OS repos, DNS, etc)
         - #3 has no FIP, can be accessed via VM1 but has no external
SNAT connectivity

Considering it works for some vm but for some not, the above point for enable-chassis-as-gw could be related.
The working vm is hosted on compute05 or some other compute node? Where is the gateway router port scheduled(can check ovn-sbctl show for cr-lrp-<router gateway port id>)?
 
 From what I can tell,  the chassis config is correct, compute05 is the
hypervisor and the faulty VM has a port binding on this hypervisor:

ovn-sbctl show
...
Chassis "8e0fa17c-e480-4b60-9015-bd8833412561"
     hostname: compute05.cloud.sci.uwo.ca
     Encap geneve
         ip: "192.168.0.105"
         options: {csum="true"}
     Port_Binding "7a5257eb-caea-45bf-b48c-620c5dff4b39"
     Port_Binding "50e16602-78e6-429b-8c2f-e7e838ece1b4"
     Port_Binding "f121c9f4-c3fe-4ea9-b754-a809be95a3fd"

The router has the candidate gateways, and the snat set:

ovn-nbctl show  92df19a7-4ebe-43ea-b233-f4e9f5a46e7c
router 92df19a7-4ebe-43ea-b233-f4e9f5a46e7c
(neutron-389439b5-07f8-44b6-a35b-c76651b48be5) (aka cs3319_public_router)
     port lrp-44ae1753-845e-4822-9e3d-a41e0469e257
         mac: "fa:16:3e:9a:db:d8"
         networks: ["129.100.21.94/22"]
         gateway chassis: [5c039d38-70b2-4ee6-9df1-596f82c68106
99facd23-ad17-4b68-a8c2-1ff6da15ac5f
1694116c-6d30-4c31-b5ea-0f411878316e
2a4bbaf9-228a-462e-8970-0cdbf59086e6 9332c61b-93e1-4a70-9547-701a014bfd98]
     port lrp-509bba37-fa06-42d6-9210-2342045490db
         mac: "fa:16:3e:ff:0f:3b"
         networks: ["172.31.100.1/23"]
     nat 11e0565a-4695-4f67-b4ee-101f1b1b9a4f
         external ip: "129.100.21.94"
         logical ip: "172.31.100.0/23"
         type: "snat"
     nat 21e4be02-d81c-46e8-8fa8-3f94edb4aed1
         external ip: "129.100.21.87"
         logical ip: "172.31.100.49"
         type: "dnat_and_snat"

Each network agent on the hypervisors shows the ovn controller up :
      OVN Controller Gateway agent | compute05.cloud.sci.uwo.ca
|                   | :-)   | UP    | ovn-controller

The ovs vswitch on the hypervisor looks correct afaict and ovn ports bfd
status are all forwarding to other hypervisors. ie:
    Port ovn-2a4bba-0
             Interface ovn-2a4bba-0
                 type: geneve
                 options: {csum="true", key=flow, remote_ip="192.168.0.106"}
                 bfd_status: {diagnostic="No Diagnostic",
flap_count="1", forwarding="true", remote_diagnostic="No Diagnostic",
remote_state=up, state=up}


Any advice on where to look would be appreciated.

I have seen mtu specific issues in the past, would be good to rule out any mtu issue with working and non working cases.

PS.  Version info:
     Neutron 22.0.0-1
     OVN 22.12

    neutron options:
       enable_distributed_floating_ip = true
       ovn_l3_scheduler = leastloaded



Thanks
Gary



--
Gary Molenkamp                  Science Technology Services
Systems/Cloud Administrator     University of Western Ontario
molenkam@uwo.ca                  http://sts.sci.uwo.ca
(519) 661-2111 x86882           (519) 661-3566


Thanks and Regards
Yatin Karel