<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    Thanks Yartin,   I will put together a bug report.<br>
    <br>
    I have found that if I disable enable_distributed_floating_ip, but
    leave the entire OVN/OVS setup as below for redundancy, then traffic
    flows as expected.<br>
    As soon as I set enable_distributed_floating_ip to true, E/W
    remains, but the N/S traffic stops for the VMs not on the host with
    the CR-LRP.<br>
    <br>
    I can't say for sure why as ovn-trace/flow debugging is still new to
    me, but the north and south dbs look correct.<br>
      <br>
    Gary<br>
    <br>
    <br>
    <br>
    <br>
    <div class="moz-cite-prefix">On 2023-07-13 11:43, Yatin Karel wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:CAPU8nTFOp=a964iqHoTGfNAkm_qnMfqZ11VjobZg+q2gnH_7FA@mail.gmail.com">
      
      <div dir="ltr">
        <div dir="ltr">Hi Gary,<br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Wed, Jul 12, 2023 at
            9:22 PM Gary Molenkamp <<a href="mailto:molenkam@uwo.ca" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">molenkam@uwo.ca</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div> A little progress, but I may be tripping over bug <a href="https://bugs.launchpad.net/neutron/+bug/2003455" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://bugs.launchpad.net/neutron/+bug/2003455</a><br>
              <br>
            </div>
          </blockquote>
          <div>That bug was mostly targeting vlan provider networks but
            you mentioned you using geneve and flat networks so this
            might not be related.</div>
          <div><br>
          </div>
          <div>Multiple components involved so it would be difficult to
            narrow it down here without much details as functionality
            wise it would have just worked(in my Train OVN environment i
            checked it worked fine). So I think it would be best to
            start with a bug report at <a href="https://bugs.launchpad.net/neutron/" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://bugs.launchpad.net/neutron/</a>
            with details(by reverting the env to previous state bridges,
            ovn-cms options configured and DVR enabled). Good to include
            details like:-</div>
          <div><br>
          </div>
          <div>- Environment details:-<br>
              - Number of controller, computes nodes</div>
          <div>  - Nodes are virtual or physical<br>
          </div>
          <div>
              - Deployment tool used, Operating System<br>
              - Neutron version</div>
          <div>  - OVN/OVS versions<br>
          </div>
          <div>
            - Share ovn-controller logs from the compute and controller
            node</div>
          <div>
            - Share OVN Northbound and Southbound DB files from the
            controller node and ovs conf.db from compute nodes<br>
            - Output of resources involved:-</div>
          <div>  - openstack network agent list</div>
          <div>  - openstack server list --long</div>
          <div>  - openstack port list --router <router id><br>
          </div>
          <div>  - Reproduction steps along with output from the
            operations(both with good and bad vms)<br>
          </div>
          <div>- Output of below commands from controller and compute
            nodes:-</div>
          <div>  - iptables -L</div>
          <div>  - netstat -i</div>
          <div>  - ip addr show<br>
          </div>
          <div>  - ovs-vsctl show</div>
          <div>  - ovs-vsctl list open .</div>
          <div> </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div> If I remove the provider bridge from the second
              hypervisor:<br>
                  ovs-vsctl remove open . external-ids
              ovn-cms-options="enable-chassis-as-gw"<br>
                  ovs-vsctl remove open . external-ids
              ovn-bridge-mappings<br>
                  ip link set br-provider down<br>
                  ovs-vsctl del-br br-provider<br>
              and disable<br>
                  enable_distributed_floating_ip<br>
              <br>
              Then both VMs using SNAT on each compute server work.<br>
              <br>
            </div>
          </blockquote>
          <div>This looks interesting. Would be good to also check the
            behavior when no VM has FIP attached.<br>
          </div>
          <div> </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div> Turning the second chassis back on as a gateway
              immediately breaks the VM on the second compute server:<br>
              <br>
                  ovs-vsctl set open .
              external-ids:ovn-cms-options=enable-chassis-as-gw<br>
                  ovs-vsctl add-br br-provider<br>
                  ovs-vsctl set open .
              external-ids:ovn-bridge-mappings=provider:br-provider  <br>
                  ovs-vsctl add-port br-provider ens256<br>
                  systemctl restart ovn-controller openvswitch.service <br>
              <br>
            </div>
          </blockquote>
          <div>Here it would be interesting to check where exactly
            traffic drops using tcpdump.<br>
          </div>
          <div> </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div> I am running neutron 22.0.1 but maybe something
              related?<br>
                  <br>
              python3-neutron-22.0.1-1.el9s.noarch<br>
              openstack-neutron-common-22.0.1-1.el9s.noarch<br>
              openstack-neutron-22.0.1-1.el9s.noarch<br>
              openstack-neutron-ml2-22.0.1-1.el9s.noarch<br>
              openstack-neutron-openvswitch-22.0.1-1.el9s.noarch<br>
              openstack-neutron-ovn-metadata-agent-22.0.1-1.el9s.noarch<br>
              <br>
              <br>
              <br>
              <br>
              <br>
              <br>
              <div>On 2023-07-12 10:21, Gary Molenkamp wrote:<br>
              </div>
              <blockquote type="cite"> For comparison, I looked at how
                openstack-ansible was setting up OVN and I don't see any
                major differences other than O-A configures a manager
                for ovs:<br>
                      ovs-vsctl --id @manager create Manager "target=\
                ....<br>
                I don't believe this is the point of failure (but feel
                free to correct me if I'm wrong ;) ).<br>
                <br>
                ovn-trace on both VM's inports shows the same trace for
                the working VM and the non-working VM. ie:<br>
                <br>
                ovn-trace --db=$SB --ovs default_net 'inport ==
                "f4cbc8c7-e7bf-47f3-9fea-a1663f6eb34d" &&
                eth.src==fa:16:3e:a6:62:8e && ip4.src ==
                172.31.101.168 && ip4.dst == <provider's
                gateway IP>'<br>
                <br>
                <br>
                <br>
                <div>On 2023-07-07 14:08, Gary Molenkamp wrote:<br>
                </div>
                <blockquote type="cite"> Happy Friday afternoon.<br>
                  <br>
                  I'm still pondering a lack of connectivity in an HA
                  OVN with each compute node acting as a potential
                  gateway chassis.<br>
                  <br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_quote">
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px
                          0.8ex;border-left:1px solid
                          rgb(204,204,204);padding-left:1ex">
                          <div>
                            <blockquote type="cite">
                              <div dir="ltr">
                                <div class="gmail_quote">
                                  <div>The problem is basically that the
                                    port of the OVN LRP may not be in
                                    the same chassis as the VM that
                                    failed (since the CR-LRP will be
                                    where the first VM of that network
                                    will be created). The suggestion is
                                    to remove the enable-chassis-as-gw
                                    from the compute nodes to allow the
                                    VM to forward traffic via
                                    tunneling/Geneve to the chassis
                                    where the LRP resides.<br>
                                  </div>
                                  <div><br>
                                  </div>
                                </div>
                              </div>
                            </blockquote>
                            <br>
                            I forced a similar VM onto the same chassis
                            as the working VM, and it was able to
                            communicate out.    If we do want to keep
                            multiple chassis' as gateways, would that be
                            addressed with the ovn-bridge-mappings?<br>
                          </div>
                        </blockquote>
                        <div><br>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <br>
                  I built a small test cloud to explore this further as
                  I continue to see the same issue:  A vm will only be
                  able to use SNAT outbound if it is on the same chassis
                  as the CR-LRP.<br>
                  <br>
                  In my test cloud, I have one controller, and two
                  compute nodes.  The controller only runs the north and
                  southd in addition to the neutron server.  Each of the
                  two compute nodes is configured as below.  On a tenent
                  network I have three VMs:<br>
                      - #1:  cirros VM with FIP<br>
                      - #2:  cirros VM running on compute node 1<br>
                      - #3:  cirros VM running on compute node 2<br>
                  <br>
                  E/W traffic between VMs in the same tenent network are
                  fine.  N/S traffic is fine for the FIP.  N/S traffic
                  only works for the VM whose CR-LRP is active on same
                  chassis.   Does anything jump out as a mistake in my
                  understanding at to how this should be working?<br>
                  <br>
                  Thanks as always,<br>
                  Gary<br>
                  <br>
                  <br>
                  on each hypervisor:<br>
                  <br>
                  /usr/bin/ovs-vsctl set open .
                  external-ids:ovn-remote=tcp:{{ controllerip }}:6642<br>
                  /usr/bin/ovs-vsctl set open .
                  external-ids:ovn-encap-type=geneve<br>
                  /usr/bin/ovs-vsctl set open .
                  external-ids:ovn-encap-ip={{ overlaynetip }}<br>
                  /usr/bin/ovs-vsctl set open .
                  external-ids:ovn-cms-options=enable-chassis-as-gw<br>
                  /usr/bin/ovs-vsctl add-br br-provider -- set bridge
                  br-provider
                  protocols=OpenFlow10,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15<br>
                  /usr/bin/ovs-vsctl add-port br-provider {{
                  provider_nic }}<br>
                  /usr/bin/ovs-vsctl br-set-external-id provider
                  bridge-id br-provider<br>
                  /usr/bin/ovs-vsctl set open .
                  external-ids:ovn-bridge-mappings=provider:br-provider<br>
                  <br>
                  plugin.ini:<br>
                  [ml2]<br>
                  mechanism_drivers = ovn<br>
                  type_drivers = flat,geneve<br>
                  tenant_network_types = geneve<br>
                  extension_drivers = port_security<br>
                  overlay_ip_version = 4<br>
                  [ml2_type_flat]<br>
                  flat_networks = provider<br>
                  [ml2_type_geneve]<br>
                  vni_ranges = 1:65536<br>
                  max_header_size = 38<br>
                  [securitygroup]<br>
                  enable_security_group = True<br>
                  firewall_driver =
                  neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver<br>
                  [ovn]<br>
                  ovn_nb_connection = tcp:{{controllerip}}:6641<br>
                  ovn_sb_connection = tcp:{{controllerip}}:6642<br>
                  ovn_l3_scheduler = leastloaded<br>
                  ovn_metadata_enabled = True<br>
                  enable_distributed_floating_ip = true<br>
                  <br>
                  <br>
                  <br>
                  <br>
                  <pre cols="72">-- 
Gary Molenkamp                  Science Technology Services
Systems Administrator           University of Western Ontario
<a href="mailto:molenkam@uwo.ca" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">molenkam@uwo.ca</a>                 <a href="http://sts.sci.uwo.ca" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">http://sts.sci.uwo.ca</a>
(519) 661-2111 x86882           (519) 661-3566</pre>
                </blockquote>
                <br>
                <pre cols="72">-- 
Gary Molenkamp                  Science Technology Services
Systems Engineer                University of Western Ontario
<a href="mailto:molenkam@uwo.ca" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">molenkam@uwo.ca</a>                 <a href="http://sts.sci.uwo.ca" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">http://sts.sci.uwo.ca</a>
(519) 661-2111 x86882           (519) 661-3566</pre>
              </blockquote>
              <br>
              <pre cols="72">-- 
Gary Molenkamp                  Science Technology Services
Systems Engineer                University of Western Ontario
<a href="mailto:molenkam@uwo.ca" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">molenkam@uwo.ca</a>                 <a href="http://sts.sci.uwo.ca" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">http://sts.sci.uwo.ca</a>
(519) 661-2111 x86882           (519) 661-3566</pre>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>Thanks and Regards</div>
          <div>Yatin Karel <br>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Gary Molenkamp                  Science Technology Services
Systems Engineer                University of Western Ontario
<a class="moz-txt-link-abbreviated" href="mailto:molenkam@uwo.ca">molenkam@uwo.ca</a>                 <a class="moz-txt-link-freetext" href="http://sts.sci.uwo.ca">http://sts.sci.uwo.ca</a>
(519) 661-2111 x86882           (519) 661-3566</pre>
  </body>
</html>