Neutron + OVN raft cluster
Tiago Pires
tiagohp at gmail.com
Fri May 6 19:50:10 UTC 2022
Hi all,
I was checking the mail list history and this thread
https://mail.openvswitch.org/pipermail/ovsdiscuss/2018March/046438.html
caught
my attention about raft ovsdb clustering.
In my setup (OVN 20.03 and Openstack Ussuri) on the ovncontroller we have
configured
the ovnremote="tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642"
with the 3 OVN central member that they are in cluster mode.
Also on the neutron ML2 side:
[ovn]
ovn_native_dhcp = True
ovn_nb_connection =
tcp:10.2X.4X.4:6641,tcp:10.2X.4X.68:6641,tcp:10.2X.4X.132:6641
ovn_sb_connection =
tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642
We are experiencing an issue with Neutron when the OVN leader decide to
take a snapshot and by design another member became leader(more less every
8 minutes):
20220505T16:57:42.135Z17401raftINFOTransferring leadership to write a
snapshot.
ovsappctl t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound
4a03
Name: OVN_Southbound
Cluster ID: ca74 (ca744caf40cd4751a2f286e35ad6541c)
Server ID: 4a03 (4a0328dce9a4495ea4f10a0340fc6d19)
Address: tcp:10.2X.4X.132:6644
Status: cluster member
Role: leader
Term: 1912
Leader: self
Vote: self
Election timer: 10000
Log: [497643, 498261]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: >3d6c >4ef0 <3d6c <4ef0
Servers:
4a03 (4a03 at tcp:10.2X.4X.132:6644) (self) next_index=497874
match_index=498260
3d6c (3d6c at tcp:10.2X.4X.68:6644) next_index=498261 match_index=498260
4ef0 (4ef0 at tcp:10.2X.4X.4:6644) next_index=498261 match_index=498260
As I understood the tcp connections from the Neutron (NB) and
ovncontrollers (SB) to OVN Central are established only with the leader:
#OVN central leader
$ netstat nap  grep 6642 more
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN

tcp 0 0 10.2X.4X.132:6642 10.24.40.17:47278
ESTABLISHED 
tcp 0 0 10.2X.4X.132:6642 10.24.40.76:36240
ESTABLISHED 
tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47280
ESTABLISHED 
tcp 0 0 10.2X.4X.132:6642 10.2X.4X.6:43102
ESTABLISHED 
tcp 0 0 10.2X.4X.132:6642 10.2X.4X.75:58890
ESTABLISHED 
tcp 0 0 10.2X.4X.132:6642 10.2X.4X.6:43108
ESTABLISHED 
tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47142
ESTABLISHED 
tcp 0 0 10.2X.4X.132:6642 10.2X.4X.71:48808
ESTABLISHED 
tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47096
ESTABLISHED 
#OVN follower 2
$ netstat nap  grep 6642
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN

tcp 0 0 10.2X.4X.4:6642 10.2X.4X.76:57256
ESTABLISHED 
tcp 0 0 10.2X.4X.4:6642 10.2X.4X.134:54026
ESTABLISHED 
tcp 0 0 10.2X.4X.4:6642 10.2X.4X.10:34962
ESTABLISHED 
tcp 0 0 10.2X.4X.4:6642 10.2X.4X.6:49238
ESTABLISHED 
tcp 0 0 10.2X.4X.4:6642 10.2X.4X.135:59972
ESTABLISHED 
tcp 0 0 10.2X.4X.4:6642 10.2X.4X.75:40162
ESTABLISHED 
tcp 0 0 10.2X.4X.4:39566 10.2X.4X.132:6642
ESTABLISHED 
#OVN follower 3
netstat nap  grep 6642
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN

tcp 0 0 10.2X.4X.68:6642 10.2X.4X.70:40750
ESTABLISHED 
tcp 0 0 10.2X.4X.68:6642 10.2X.4X.11:49718
ESTABLISHED 
tcp 0 0 10.2X.4X.68:45632 10.2X.4X.132:6642
ESTABLISHED 
tcp 0 0 10.2X.4X.68:6642 10.2X.4X.16:44816
ESTABLISHED 
tcp 0 0 10.2X.4X.68:6642 10.2X.4X.7:45216
ESTABLISHED
The issue that we are experiencing is on the neutronserver that
disconnects when there is the ovn leader change (due snapshot like each 8
minutes) and reconnects to the next leader. It breaks the Openstack API
when someone is trying to create a VM at the same time.
First, is my current configuration correct? Should the leader change and
break the neutron side? Or is there some missing configuration?
I was wondering if it is possible to use a LB with VIP and this VIP balance
the connections to the ovn central members and I would reconfigure on the
neutron side only with the VIP and also on the ovscontrollers. Does that
make sense?
Thank you.
Regards,
Tiago Pires
