SNAT failure with OVN under Antelope

Roberto Bartzen Acosta roberto.acosta at luizalabs.com
Tue Jun 27 15:18:22 UTC 2023


Hi Gary,

Em ter., 27 de jun. de 2023 às 11:47, Yatin Karel <ykarel at redhat.com>
escreveu:

> Hi Gary,
>
> On top what Rodolfo said
> On Tue, Jun 27, 2023 at 5:15 PM Gary Molenkamp <molenkam at uwo.ca> wrote:
>
>> Good morning,   I'm having a problem with snat routing under OVN but I'm
>> not sure if something is mis-configured or just my understanding of how
>> OVN is architected is wrong.
>>
>> I've built a Zed cloud, since upgraded to Antelope, using the Neutron
>> Manual install method here:
>> https://docs.openstack.org/neutron/latest/install/ovn/manual_install.html
>> I'm using a multi-tenent configuration using geneve and the flat
>> provider network is present on each hypervisor. Each hypervisor is
>> connected to the physical provider network, along with the tenent
>> network and is tagged as an external chassis under OVN.
>>          br-int exists, as does br-provider
>>          ovs-vsctl set open .
>> external-ids:ovn-cms-options=enable-chassis-as-gw
>>
>
> Any specific reason to enable gateway on compute nodes? Generally it's
> recommended to use controller/network nodes as gateway. What's your
> env(number of controllers, network, compute nodes)?
>

Wouldn't it be interesting to enable-chassis-as-gw on the compute nodes,
just in case you want to use DVR: If that's the case, you need to map the
external bridge (ovs-vsctl set open . external-ids:ovn-bridge-mappings=...)
via ansible this is created automatically, but in the manual installation I
didn't see any mention of it.

The problem is basically that the port of the OVN LRP may not be in the
same chassis as the VM that failed (since the CR-LRP will be where the
first VM of that network will be created). The suggestion is to remove the
enable-chassis-as-gw from the compute nodes to allow the VM to forward
traffic via tunneling/Geneve to the chassis where the LRP resides.

ovs-vsctl remove open . external-ids
ovn-cms-options="enable-chassis-as-gw" ovs-vsctl
remove open . external-ids ovn-bridge-mappings ip link set br-provider-name
down ovs-vsctl del-br br-provider-name systemctl restart
ovn-controller systemctl
restart openvswitch-switch



>
>> For most cases, distributed FIP based connectivity is working without
>> issue, but I'm having an issue where VMs without a FIP are not always
>> able to use the SNAT services of the tenent network router.
>> Scenario:
>>      Internal network named cs3319:  with subnet 172.31.100.0/23
>>      Has a router named cs3319_router with external gateway set (snat
>> enabled)
>>
>>      This network has 3 vms:
>>          - #1 has a FIP and can be accessed externally
>>          - #2 has no FIP, can be accessed via VM1 and can access
>> external resources via SNAT  (ie OS repos, DNS, etc)
>>          - #3 has no FIP, can be accessed via VM1 but has no external
>> SNAT connectivity
>>
>> Considering it works for some vm but for some not, the above point for
> enable-chassis-as-gw could be related.
> The working vm is hosted on compute05 or some other compute node? Where is
> the gateway router port scheduled(can check ovn-sbctl show for
> cr-lrp-<router gateway port id>)?
>
>
>>  From what I can tell,  the chassis config is correct, compute05 is the
>> hypervisor and the faulty VM has a port binding on this hypervisor:
>>
>> ovn-sbctl show
>> ...
>> Chassis "8e0fa17c-e480-4b60-9015-bd8833412561"
>>      hostname: compute05.cloud.sci.uwo.ca
>>      Encap geneve
>>          ip: "192.168.0.105"
>>          options: {csum="true"}
>>      Port_Binding "7a5257eb-caea-45bf-b48c-620c5dff4b39"
>>      Port_Binding "50e16602-78e6-429b-8c2f-e7e838ece1b4"
>>      Port_Binding "f121c9f4-c3fe-4ea9-b754-a809be95a3fd"
>>
>> The router has the candidate gateways, and the snat set:
>>
>> ovn-nbctl show  92df19a7-4ebe-43ea-b233-f4e9f5a46e7c
>> router 92df19a7-4ebe-43ea-b233-f4e9f5a46e7c
>> (neutron-389439b5-07f8-44b6-a35b-c76651b48be5) (aka cs3319_public_router)
>>      port lrp-44ae1753-845e-4822-9e3d-a41e0469e257
>>          mac: "fa:16:3e:9a:db:d8"
>>          networks: ["129.100.21.94/22"]
>>          gateway chassis: [5c039d38-70b2-4ee6-9df1-596f82c68106
>> 99facd23-ad17-4b68-a8c2-1ff6da15ac5f
>> 1694116c-6d30-4c31-b5ea-0f411878316e
>> 2a4bbaf9-228a-462e-8970-0cdbf59086e6 9332c61b-93e1-4a70-9547-701a014bfd98]
>>      port lrp-509bba37-fa06-42d6-9210-2342045490db
>>          mac: "fa:16:3e:ff:0f:3b"
>>          networks: ["172.31.100.1/23"]
>>      nat 11e0565a-4695-4f67-b4ee-101f1b1b9a4f
>>          external ip: "129.100.21.94"
>>          logical ip: "172.31.100.0/23"
>>          type: "snat"
>>      nat 21e4be02-d81c-46e8-8fa8-3f94edb4aed1
>>          external ip: "129.100.21.87"
>>          logical ip: "172.31.100.49"
>>          type: "dnat_and_snat"
>>
>> Each network agent on the hypervisors shows the ovn controller up :
>>       OVN Controller Gateway agent | compute05.cloud.sci.uwo.ca
>> |                   | :-)   | UP    | ovn-controller
>>
>> The ovs vswitch on the hypervisor looks correct afaict and ovn ports bfd
>> status are all forwarding to other hypervisors. ie:
>>     Port ovn-2a4bba-0
>>              Interface ovn-2a4bba-0
>>                  type: geneve
>>                  options: {csum="true", key=flow,
>> remote_ip="192.168.0.106"}
>>                  bfd_status: {diagnostic="No Diagnostic",
>> flap_count="1", forwarding="true", remote_diagnostic="No Diagnostic",
>> remote_state=up, state=up}
>>
>>
>> Any advice on where to look would be appreciated.
>>
>> I have seen mtu specific issues in the past, would be good to rule out
> any mtu issue with working and non working cases.
>
> PS.  Version info:
>>      Neutron 22.0.0-1
>>      OVN 22.12
>>
>>     neutron options:
>>        enable_distributed_floating_ip = true
>>        ovn_l3_scheduler = leastloaded
>>
>>
>>
>> Thanks
>> Gary
>>
>>
>>
>> --
>> Gary Molenkamp                  Science Technology Services
>> Systems/Cloud Administrator     University of Western Ontario
>> molenkam at uwo.ca                  http://sts.sci.uwo.ca
>> (519) 661-2111 x86882           (519) 661-3566
>>
>>
>> Thanks and Regards
> Yatin Karel
>

-- 




_‘Esta mensagem é direcionada apenas para os endereços constantes no 
cabeçalho inicial. Se você não está listado nos endereços constantes no 
cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa 
mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão 
imediatamente anuladas e proibidas’._


* **‘Apesar do Magazine Luiza tomar 
todas as precauções razoáveis para assegurar que nenhum vírus esteja 
presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por 
quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230627/c03bf818/attachment-0001.htm>


More information about the openstack-discuss mailing list