That bug was mostly targeting vlan provider networks but
you mentioned you using geneve and flat networks so this
might not be related.
Multiple components involved so it would be difficult to
narrow it down here without much details as functionality
wise it would have just worked(in my Train OVN environment i
checked it worked fine). So I think it would be best to
start with a bug report at
https://bugs.launchpad.net/neutron/
with details(by reverting the env to previous state bridges,
ovn-cms options configured and DVR enabled). Good to include
details like:-
- Environment details:-
- Number of controller, computes nodes
- Nodes are virtual or physical
- Deployment tool used, Operating System
- Neutron version
- OVN/OVS versions
- Share ovn-controller logs from the compute and controller
node
- Share OVN Northbound and Southbound DB files from the
controller node and ovs conf.db from compute nodes
- Output of resources involved:-
- openstack network agent list
- openstack server list --long
- openstack port list --router <router id>
- Reproduction steps along with output from the
operations(both with good and bad vms)
- Output of below commands from controller and compute
nodes:-
- iptables -L
- netstat -i
- ip addr show
- ovs-vsctl show
- ovs-vsctl list open .
If I remove the provider bridge from the second
hypervisor:
ovs-vsctl remove open . external-ids
ovn-cms-options="enable-chassis-as-gw"
ovs-vsctl remove open . external-ids
ovn-bridge-mappings
ip link set br-provider down
ovs-vsctl del-br br-provider
and disable
enable_distributed_floating_ip
Then both VMs using SNAT on each compute server work.
This looks interesting. Would be good to also check the
behavior when no VM has FIP attached.
Turning the second chassis back on as a gateway
immediately breaks the VM on the second compute server:
ovs-vsctl set open .
external-ids:ovn-cms-options=enable-chassis-as-gw
ovs-vsctl add-br br-provider
ovs-vsctl set open .
external-ids:ovn-bridge-mappings=provider:br-provider
ovs-vsctl add-port br-provider ens256
systemctl restart ovn-controller openvswitch.service
Here it would be interesting to check where exactly
traffic drops using tcpdump.
I am running neutron 22.0.1 but maybe something
related?
python3-neutron-22.0.1-1.el9s.noarch
openstack-neutron-common-22.0.1-1.el9s.noarch
openstack-neutron-22.0.1-1.el9s.noarch
openstack-neutron-ml2-22.0.1-1.el9s.noarch
openstack-neutron-openvswitch-22.0.1-1.el9s.noarch
openstack-neutron-ovn-metadata-agent-22.0.1-1.el9s.noarch
On 2023-07-12 10:21, Gary Molenkamp wrote:
For comparison, I looked at how
openstack-ansible was setting up OVN and I don't see any
major differences other than O-A configures a manager
for ovs:
ovs-vsctl --id @manager create Manager "target=\
....
I don't believe this is the point of failure (but feel
free to correct me if I'm wrong ;) ).
ovn-trace on both VM's inports shows the same trace for
the working VM and the non-working VM. ie:
ovn-trace --db=$SB --ovs default_net 'inport ==
"f4cbc8c7-e7bf-47f3-9fea-a1663f6eb34d" &&
eth.src==fa:16:3e:a6:62:8e && ip4.src ==
172.31.101.168 && ip4.dst == <provider's
gateway IP>'
On 2023-07-07 14:08, Gary Molenkamp wrote:
Happy Friday afternoon.
I'm still pondering a lack of connectivity in an HA
OVN with each compute node acting as a potential
gateway chassis.
The problem is basically that the
port of the OVN LRP may not be in
the same chassis as the VM that
failed (since the CR-LRP will be
where the first VM of that network
will be created). The suggestion is
to remove the enable-chassis-as-gw
from the compute nodes to allow the
VM to forward traffic via
tunneling/Geneve to the chassis
where the LRP resides.
I forced a similar VM onto the same chassis
as the working VM, and it was able to
communicate out. If we do want to keep
multiple chassis' as gateways, would that be
addressed with the ovn-bridge-mappings?
I built a small test cloud to explore this further as
I continue to see the same issue: A vm will only be
able to use SNAT outbound if it is on the same chassis
as the CR-LRP.
In my test cloud, I have one controller, and two
compute nodes. The controller only runs the north and
southd in addition to the neutron server. Each of the
two compute nodes is configured as below. On a tenent
network I have three VMs:
- #1: cirros VM with FIP
- #2: cirros VM running on compute node 1
- #3: cirros VM running on compute node 2
E/W traffic between VMs in the same tenent network are
fine. N/S traffic is fine for the FIP. N/S traffic
only works for the VM whose CR-LRP is active on same
chassis. Does anything jump out as a mistake in my
understanding at to how this should be working?
Thanks as always,
Gary
on each hypervisor:
/usr/bin/ovs-vsctl set open .
external-ids:ovn-remote=tcp:{{ controllerip }}:6642
/usr/bin/ovs-vsctl set open .
external-ids:ovn-encap-type=geneve
/usr/bin/ovs-vsctl set open .
external-ids:ovn-encap-ip={{ overlaynetip }}
/usr/bin/ovs-vsctl set open .
external-ids:ovn-cms-options=enable-chassis-as-gw
/usr/bin/ovs-vsctl add-br br-provider -- set bridge
br-provider
protocols=OpenFlow10,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15
/usr/bin/ovs-vsctl add-port br-provider {{
provider_nic }}
/usr/bin/ovs-vsctl br-set-external-id provider
bridge-id br-provider
/usr/bin/ovs-vsctl set open .
external-ids:ovn-bridge-mappings=provider:br-provider
plugin.ini:
[ml2]
mechanism_drivers = ovn
type_drivers = flat,geneve
tenant_network_types = geneve
extension_drivers = port_security
overlay_ip_version = 4
[ml2_type_flat]
flat_networks = provider
[ml2_type_geneve]
vni_ranges = 1:65536
max_header_size = 38
[securitygroup]
enable_security_group = True
firewall_driver =
neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
[ovn]
ovn_nb_connection = tcp:{{controllerip}}:6641
ovn_sb_connection = tcp:{{controllerip}}:6642
ovn_l3_scheduler = leastloaded
ovn_metadata_enabled = True
enable_distributed_floating_ip = true
--
Gary Molenkamp Science Technology Services
Systems Administrator University of Western Ontario
molenkam@uwo.ca http://sts.sci.uwo.ca
(519) 661-2111 x86882 (519) 661-3566
--
Gary Molenkamp Science Technology Services
Systems Engineer University of Western Ontario
molenkam@uwo.ca http://sts.sci.uwo.ca
(519) 661-2111 x86882 (519) 661-3566
--
Gary Molenkamp Science Technology Services
Systems Engineer University of Western Ontario
molenkam@uwo.ca http://sts.sci.uwo.ca
(519) 661-2111 x86882 (519) 661-3566
Thanks and Regards
Yatin Karel