SNAT failure with OVN under Antelope

Gary Molenkamp molenkam at uwo.ca
Thu Jul 13 17:36:21 UTC 2023


Thanks Yartin,   I will put together a bug report.

I have found that if I disable enable_distributed_floating_ip, but leave 
the entire OVN/OVS setup as below for redundancy, then traffic flows as 
expected.
As soon as I set enable_distributed_floating_ip to true, E/W remains, 
but the N/S traffic stops for the VMs not on the host with the CR-LRP.

I can't say for sure why as ovn-trace/flow debugging is still new to me, 
but the north and south dbs look correct.

Gary




On 2023-07-13 11:43, Yatin Karel wrote:
> Hi Gary,
>
> On Wed, Jul 12, 2023 at 9:22 PM Gary Molenkamp <molenkam at uwo.ca> wrote:
>
>     A little progress, but I may be tripping over bug
>     https://bugs.launchpad.net/neutron/+bug/2003455
>
> That bug was mostly targeting vlan provider networks but you mentioned 
> you using geneve and flat networks so this might not be related.
>
> Multiple components involved so it would be difficult to narrow it 
> down here without much details as functionality wise it would have 
> just worked(in my Train OVN environment i checked it worked fine). So 
> I think it would be best to start with a bug report at 
> https://bugs.launchpad.net/neutron/ with details(by reverting the env 
> to previous state bridges, ovn-cms options configured and DVR 
> enabled). Good to include details like:-
>
> - Environment details:-
>   - Number of controller, computes nodes
>   - Nodes are virtual or physical
>   - Deployment tool used, Operating System
>   - Neutron version
>   - OVN/OVS versions
> - Share ovn-controller logs from the compute and controller node
> - Share OVN Northbound and Southbound DB files from the controller 
> node and ovs conf.db from compute nodes
> - Output of resources involved:-
>   - openstack network agent list
>   - openstack server list --long
>   - openstack port list --router <router id>
>   - Reproduction steps along with output from the operations(both with 
> good and bad vms)
> - Output of below commands from controller and compute nodes:-
>   - iptables -L
>   - netstat -i
>   - ip addr show
>   - ovs-vsctl show
>   - ovs-vsctl list open .
>
>     If I remove the provider bridge from the second hypervisor:
>         ovs-vsctl remove open . external-ids
>     ovn-cms-options="enable-chassis-as-gw"
>         ovs-vsctl remove open . external-ids ovn-bridge-mappings
>         ip link set br-provider down
>         ovs-vsctl del-br br-provider
>     and disable
>         enable_distributed_floating_ip
>
>     Then both VMs using SNAT on each compute server work.
>
> This looks interesting. Would be good to also check the behavior when 
> no VM has FIP attached.
>
>     Turning the second chassis back on as a gateway immediately breaks
>     the VM on the second compute server:
>
>         ovs-vsctl set open .
>     external-ids:ovn-cms-options=enable-chassis-as-gw
>         ovs-vsctl add-br br-provider
>         ovs-vsctl set open .
>     external-ids:ovn-bridge-mappings=provider:br-provider
>         ovs-vsctl add-port br-provider ens256
>         systemctl restart ovn-controller openvswitch.service
>
> Here it would be interesting to check where exactly traffic drops 
> using tcpdump.
>
>     I am running neutron 22.0.1 but maybe something related?
>
>     python3-neutron-22.0.1-1.el9s.noarch
>     openstack-neutron-common-22.0.1-1.el9s.noarch
>     openstack-neutron-22.0.1-1.el9s.noarch
>     openstack-neutron-ml2-22.0.1-1.el9s.noarch
>     openstack-neutron-openvswitch-22.0.1-1.el9s.noarch
>     openstack-neutron-ovn-metadata-agent-22.0.1-1.el9s.noarch
>
>
>
>
>
>
>     On 2023-07-12 10:21, Gary Molenkamp wrote:
>>     For comparison, I looked at how openstack-ansible was setting up
>>     OVN and I don't see any major differences other than O-A
>>     configures a manager for ovs:
>>           ovs-vsctl --id @manager create Manager "target=\ ....
>>     I don't believe this is the point of failure (but feel free to
>>     correct me if I'm wrong ;) ).
>>
>>     ovn-trace on both VM's inports shows the same trace for the
>>     working VM and the non-working VM. ie:
>>
>>     ovn-trace --db=$SB --ovs default_net 'inport ==
>>     "f4cbc8c7-e7bf-47f3-9fea-a1663f6eb34d" &&
>>     eth.src==fa:16:3e:a6:62:8e && ip4.src == 172.31.101.168 &&
>>     ip4.dst == <provider's gateway IP>'
>>
>>
>>
>>     On 2023-07-07 14:08, Gary Molenkamp wrote:
>>>     Happy Friday afternoon.
>>>
>>>     I'm still pondering a lack of connectivity in an HA OVN with
>>>     each compute node acting as a potential gateway chassis.
>>>
>>>>>         The problem is basically that the port of the OVN LRP may
>>>>>         not be in the same chassis as the VM that failed (since
>>>>>         the CR-LRP will be where the first VM of that network will
>>>>>         be created). The suggestion is to remove the
>>>>>         enable-chassis-as-gw from the compute nodes to allow the
>>>>>         VM to forward traffic via tunneling/Geneve to the chassis
>>>>>         where the LRP resides.
>>>>>
>>>>
>>>>         I forced a similar VM onto the same chassis as the working
>>>>         VM, and it was able to communicate out.    If we do want to
>>>>         keep multiple chassis' as gateways, would that be addressed
>>>>         with the ovn-bridge-mappings?
>>>>
>>>>
>>>
>>>     I built a small test cloud to explore this further as I continue
>>>     to see the same issue:  A vm will only be able to use SNAT
>>>     outbound if it is on the same chassis as the CR-LRP.
>>>
>>>     In my test cloud, I have one controller, and two compute nodes. 
>>>     The controller only runs the north and southd in addition to the
>>>     neutron server.  Each of the two compute nodes is configured as
>>>     below.  On a tenent network I have three VMs:
>>>         - #1:  cirros VM with FIP
>>>         - #2:  cirros VM running on compute node 1
>>>         - #3:  cirros VM running on compute node 2
>>>
>>>     E/W traffic between VMs in the same tenent network are fine. 
>>>     N/S traffic is fine for the FIP.  N/S traffic only works for the
>>>     VM whose CR-LRP is active on same chassis.   Does anything jump
>>>     out as a mistake in my understanding at to how this should be
>>>     working?
>>>
>>>     Thanks as always,
>>>     Gary
>>>
>>>
>>>     on each hypervisor:
>>>
>>>     /usr/bin/ovs-vsctl set open . external-ids:ovn-remote=tcp:{{
>>>     controllerip }}:6642
>>>     /usr/bin/ovs-vsctl set open . external-ids:ovn-encap-type=geneve
>>>     /usr/bin/ovs-vsctl set open . external-ids:ovn-encap-ip={{
>>>     overlaynetip }}
>>>     /usr/bin/ovs-vsctl set open .
>>>     external-ids:ovn-cms-options=enable-chassis-as-gw
>>>     /usr/bin/ovs-vsctl add-br br-provider -- set bridge br-provider
>>>     protocols=OpenFlow10,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15
>>>     /usr/bin/ovs-vsctl add-port br-provider {{ provider_nic }}
>>>     /usr/bin/ovs-vsctl br-set-external-id provider bridge-id br-provider
>>>     /usr/bin/ovs-vsctl set open .
>>>     external-ids:ovn-bridge-mappings=provider:br-provider
>>>
>>>     plugin.ini:
>>>     [ml2]
>>>     mechanism_drivers = ovn
>>>     type_drivers = flat,geneve
>>>     tenant_network_types = geneve
>>>     extension_drivers = port_security
>>>     overlay_ip_version = 4
>>>     [ml2_type_flat]
>>>     flat_networks = provider
>>>     [ml2_type_geneve]
>>>     vni_ranges = 1:65536
>>>     max_header_size = 38
>>>     [securitygroup]
>>>     enable_security_group = True
>>>     firewall_driver =
>>>     neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
>>>     [ovn]
>>>     ovn_nb_connection = tcp:{{controllerip}}:6641
>>>     ovn_sb_connection = tcp:{{controllerip}}:6642
>>>     ovn_l3_scheduler = leastloaded
>>>     ovn_metadata_enabled = True
>>>     enable_distributed_floating_ip = true
>>>
>>>
>>>
>>>
>>>     -- 
>>>     Gary Molenkamp			Science Technology Services
>>>     Systems Administrator		University of Western Ontario
>>>     molenkam at uwo.ca                  http://sts.sci.uwo.ca
>>>     (519) 661-2111 x86882		(519) 661-3566
>>
>>     -- 
>>     Gary Molenkamp			Science Technology Services
>>     Systems Engineer		University of Western Ontario
>>     molenkam at uwo.ca                  http://sts.sci.uwo.ca
>>     (519) 661-2111 x86882		(519) 661-3566
>
>     -- 
>     Gary Molenkamp			Science Technology Services
>     Systems Engineer		University of Western Ontario
>     molenkam at uwo.ca                  http://sts.sci.uwo.ca
>     (519) 661-2111 x86882		(519) 661-3566
>
>
> Thanks and Regards
> Yatin Karel

-- 
Gary Molenkamp			Science Technology Services
Systems Engineer		University of Western Ontario
molenkam at uwo.ca                  http://sts.sci.uwo.ca
(519) 661-2111 x86882		(519) 661-3566
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230713/42b72a66/attachment-0001.htm>


More information about the openstack-discuss mailing list