Re: Neutron + OVN raft cluster

9 May 2022

      Hi all,

Thanks Terry.
As I'm using an old OVS (2.13)/OVN(20.3) version due openstack release
(Ussuri), would it be possible to upgrade only the OVN/OVS without
upgrading the whole openstack?
I'm only checking options in this case, how are you guys dealing with this
in production?

Regards,

Tiago Pires

Em seg., 9 de mai. de 2022 às 10:45, Terry Wilson <twilson@redhat.com>
escreveu:
...
Sorry, I was on PTO. Jakub is right, w/o using python-ovs 2.17, when
ovs breaks the connection for the leadership clients will re-download
the entire content of their registered tables. With 2.17,
monitor-cond-since/update3 support is added to python-ovs and it
should just download the changes since they reconnected. As long as
the client code handles reconnections, this reconnecting should not be
an issue. It is still possible that there is code that doesn't
properly handle reconnections in general, but I'd start with trying
ovs 2.17. The disonnections will always happen, but they shouldn't
break things.
On Fri, May 6, 2022 at 5:13 PM Tiago Pires <tiagohp@gmail.com> wrote:
...
Hi Mohammed,
It seems a little bit like our issue.
Thank you.
Tiago Pires
Em sex., 6 de mai. de 2022 às 18:21, Mohammed Naser <mnaser@vexxhost.com>
...
...
Hi Tiago,
Have you seen this?
https://bugs.launchpad.net/nova/+bug/1969592
Mohammed
On Fri, May 6, 2022 at 3:56 PM Tiago Pires <tiagohp@gmail.com> wrote:
...
Hi all,
I was checking the mail list history and this thread
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-March/046438.html
caught my attention about raft ovsdb clustering.
...
...
In my setup (OVN 20.03 and Openstack Ussuri) on the ovn-controller we
have configured the
ovn-remote="tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642"
with the 3 OVN central member that they are in cluster mode.
Also on the neutron ML2 side:
[ovn]
ovn_native_dhcp = True
ovn_nb_connection =
tcp:10.2X.4X.4:6641,tcp:10.2X.4X.68:6641,tcp:10.2X.4X.132:6641
ovn_sb_connection =
tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642
We are experiencing an issue with Neutron when the OVN leader decide
to take a snapshot and by design another member became leader(more less
every 8 minutes):
2022-05-05T16:57:42.135Z|17401|raft|INFO|Transferring leadership to
write a snapshot.
ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound
4a03
Name: OVN_Southbound
Cluster ID: ca74 (ca744caf-40cd-4751-a2f2-86e35ad6541c)
Server ID: 4a03 (4a0328dc-e9a4-495e-a4f1-0a0340fc6d19)
Address: tcp:10.2X.4X.132:6644
Status: cluster member
Role: leader
Term: 1912
Leader: self
Vote: self
Election timer: 10000
Log: [497643, 498261]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->3d6c ->4ef0 <-3d6c <-4ef0
Servers:
    4a03 (4a03 at tcp:10.2X.4X.132:6644) (self) next_index=497874
match_index=498260
    3d6c (3d6c at tcp:10.2X.4X.68:6644) next_index=498261
match_index=498260
    4ef0 (4ef0 at tcp:10.2X.4X.4:6644) next_index=498261
match_index=498260
As I understood the tcp connections from the Neutron (NB) and
ovn-controllers (SB) to OVN Central are established only with the leader:
#OVN central leader
$ netstat -nap | grep 6642| more
tcp        0      0 0.0.0.0:6642            0.0.0.0:*
 LISTEN      -
tcp        0      0 10.2X.4X.132:6642       10.24.40.17:47278
 ESTABLISHED -
tcp        0      0 10.2X.4X.132:6642       10.24.40.76:36240
 ESTABLISHED -
tcp        0      0 10.2X.4X.132:6642       10.2X.4X.17:47280
 ESTABLISHED -
tcp        0      0 10.2X.4X.132:6642       10.2X.4X.6:43102
ESTABLISHED -
tcp        0      0 10.2X.4X.132:6642       10.2X.4X.75:58890
 ESTABLISHED -
tcp        0      0 10.2X.4X.132:6642       10.2X.4X.6:43108
ESTABLISHED -
tcp        0      0 10.2X.4X.132:6642       10.2X.4X.17:47142
 ESTABLISHED -
tcp        0      0 10.2X.4X.132:6642       10.2X.4X.71:48808
 ESTABLISHED -
tcp        0      0 10.2X.4X.132:6642       10.2X.4X.17:47096
 ESTABLISHED -
#OVN follower 2
$ netstat -nap | grep 6642
tcp        0      0 0.0.0.0:6642            0.0.0.0:*
 LISTEN      -
tcp        0      0 10.2X.4X.4:6642         10.2X.4X.76:57256
 ESTABLISHED -
tcp        0      0 10.2X.4X.4:6642         10.2X.4X.134:54026
ESTABLISHED -
tcp        0      0 10.2X.4X.4:6642         10.2X.4X.10:34962
 ESTABLISHED -
tcp        0      0 10.2X.4X.4:6642         10.2X.4X.6:49238
ESTABLISHED -
tcp        0      0 10.2X.4X.4:6642         10.2X.4X.135:59972
ESTABLISHED -
tcp        0      0 10.2X.4X.4:6642         10.2X.4X.75:40162
 ESTABLISHED -
tcp        0      0 10.2X.4X.4:39566        10.2X.4X.132:6642
 ESTABLISHED -
#OVN follower 3
netstat -nap | grep 6642
tcp        0      0 0.0.0.0:6642            0.0.0.0:*
 LISTEN      -
tcp        0      0 10.2X.4X.68:6642        10.2X.4X.70:40750
 ESTABLISHED -
tcp        0      0 10.2X.4X.68:6642        10.2X.4X.11:49718
 ESTABLISHED -
tcp        0      0 10.2X.4X.68:45632       10.2X.4X.132:6642
 ESTABLISHED -
tcp        0      0 10.2X.4X.68:6642        10.2X.4X.16:44816
 ESTABLISHED -
tcp        0      0 10.2X.4X.68:6642        10.2X.4X.7:45216
ESTABLISHED
The issue that we are experiencing is on the neutron-server that
disconnects when there is the ovn leader change (due snapshot like each 8
minutes) and reconnects to the next leader. It breaks the Openstack API
when someone is trying to create a VM at the same time.
First, is my current configuration correct? Should the leader change
and break the neutron side? Or is there some missing configuration?
I was wondering if it is possible to use a LB with VIP and this VIP
balance the connections to the ovn central members and I would reconfigure
on the neutron side only with the VIP and also on the ovs-controllers. Does
escreveu:
that make sense?
...
...
...
Thank you.
Regards,
Tiago Pires
--
Mohammed Naser
VEXXHOST, Inc.