Neutron + OVN raft cluster
Hi all, I was checking the mail list history and this thread https://mail.openvswitch.org/pipermail/ovs-discuss/2018-March/046438.html caught my attention about raft ovsdb clustering. In my setup (OVN 20.03 and Openstack Ussuri) on the ovn-controller we have configured the ovn-remote="tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642" with the 3 OVN central member that they are in cluster mode. Also on the neutron ML2 side: [ovn] ovn_native_dhcp = True ovn_nb_connection = tcp:10.2X.4X.4:6641,tcp:10.2X.4X.68:6641,tcp:10.2X.4X.132:6641 ovn_sb_connection = tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642 We are experiencing an issue with Neutron when the OVN leader decide to take a snapshot and by design another member became leader(more less every 8 minutes): 2022-05-05T16:57:42.135Z|17401|raft|INFO|Transferring leadership to write a snapshot. ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound 4a03 Name: OVN_Southbound Cluster ID: ca74 (ca744caf-40cd-4751-a2f2-86e35ad6541c) Server ID: 4a03 (4a0328dc-e9a4-495e-a4f1-0a0340fc6d19) Address: tcp:10.2X.4X.132:6644 Status: cluster member Role: leader Term: 1912 Leader: self Vote: self Election timer: 10000 Log: [497643, 498261] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->3d6c ->4ef0 <-3d6c <-4ef0 Servers: 4a03 (4a03 at tcp:10.2X.4X.132:6644) (self) next_index=497874 match_index=498260 3d6c (3d6c at tcp:10.2X.4X.68:6644) next_index=498261 match_index=498260 4ef0 (4ef0 at tcp:10.2X.4X.4:6644) next_index=498261 match_index=498260 As I understood the tcp connections from the Neutron (NB) and ovn-controllers (SB) to OVN Central are established only with the leader: #OVN central leader $ netstat -nap | grep 6642| more tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.132:6642 10.24.40.17:47278 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.24.40.76:36240 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47280 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.6:43102 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.75:58890 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.6:43108 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47142 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.71:48808 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47096 ESTABLISHED - #OVN follower 2 $ netstat -nap | grep 6642 tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.76:57256 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.134:54026 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.10:34962 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.6:49238 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.135:59972 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.75:40162 ESTABLISHED - tcp 0 0 10.2X.4X.4:39566 10.2X.4X.132:6642 ESTABLISHED - #OVN follower 3 netstat -nap | grep 6642 tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.70:40750 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.11:49718 ESTABLISHED - tcp 0 0 10.2X.4X.68:45632 10.2X.4X.132:6642 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.16:44816 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.7:45216 ESTABLISHED The issue that we are experiencing is on the neutron-server that disconnects when there is the ovn leader change (due snapshot like each 8 minutes) and reconnects to the next leader. It breaks the Openstack API when someone is trying to create a VM at the same time. First, is my current configuration correct? Should the leader change and break the neutron side? Or is there some missing configuration? I was wondering if it is possible to use a LB with VIP and this VIP balance the connections to the ovn central members and I would reconfigure on the neutron side only with the VIP and also on the ovs-controllers. Does that make sense? Thank you. Regards, Tiago Pires
Hi Tiago, Have you seen this? https://bugs.launchpad.net/nova/+bug/1969592 Mohammed On Fri, May 6, 2022 at 3:56 PM Tiago Pires <tiagohp@gmail.com> wrote:
Hi all,
I was checking the mail list history and this thread https://mail.openvswitch.org/pipermail/ovs-discuss/2018-March/046438.html caught my attention about raft ovsdb clustering. In my setup (OVN 20.03 and Openstack Ussuri) on the ovn-controller we have configured the ovn-remote="tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642" with the 3 OVN central member that they are in cluster mode. Also on the neutron ML2 side: [ovn] ovn_native_dhcp = True ovn_nb_connection = tcp:10.2X.4X.4:6641,tcp:10.2X.4X.68:6641,tcp:10.2X.4X.132:6641 ovn_sb_connection = tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642
We are experiencing an issue with Neutron when the OVN leader decide to take a snapshot and by design another member became leader(more less every 8 minutes): 2022-05-05T16:57:42.135Z|17401|raft|INFO|Transferring leadership to write a snapshot.
ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound 4a03 Name: OVN_Southbound Cluster ID: ca74 (ca744caf-40cd-4751-a2f2-86e35ad6541c) Server ID: 4a03 (4a0328dc-e9a4-495e-a4f1-0a0340fc6d19) Address: tcp:10.2X.4X.132:6644 Status: cluster member Role: leader Term: 1912 Leader: self Vote: self
Election timer: 10000 Log: [497643, 498261] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->3d6c ->4ef0 <-3d6c <-4ef0 Servers: 4a03 (4a03 at tcp:10.2X.4X.132:6644) (self) next_index=497874 match_index=498260 3d6c (3d6c at tcp:10.2X.4X.68:6644) next_index=498261 match_index=498260 4ef0 (4ef0 at tcp:10.2X.4X.4:6644) next_index=498261 match_index=498260
As I understood the tcp connections from the Neutron (NB) and ovn-controllers (SB) to OVN Central are established only with the leader:
#OVN central leader $ netstat -nap | grep 6642| more
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.132:6642 10.24.40.17:47278 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.24.40.76:36240 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47280 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.6:43102 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.75:58890 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.6:43108 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47142 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.71:48808 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47096 ESTABLISHED - #OVN follower 2
$ netstat -nap | grep 6642
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.76:57256 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.134:54026 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.10:34962 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.6:49238 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.135:59972 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.75:40162 ESTABLISHED - tcp 0 0 10.2X.4X.4:39566 10.2X.4X.132:6642 ESTABLISHED - #OVN follower 3
netstat -nap | grep 6642
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.70:40750 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.11:49718 ESTABLISHED - tcp 0 0 10.2X.4X.68:45632 10.2X.4X.132:6642 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.16:44816 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.7:45216 ESTABLISHED
The issue that we are experiencing is on the neutron-server that disconnects when there is the ovn leader change (due snapshot like each 8 minutes) and reconnects to the next leader. It breaks the Openstack API when someone is trying to create a VM at the same time. First, is my current configuration correct? Should the leader change and break the neutron side? Or is there some missing configuration? I was wondering if it is possible to use a LB with VIP and this VIP balance the connections to the ovn central members and I would reconfigure on the neutron side only with the VIP and also on the ovs-controllers. Does that make sense?
Thank you.
Regards,
Tiago Pires
-- Mohammed Naser VEXXHOST, Inc.
Hi Mohammed, It seems a little bit like our issue. Thank you. Tiago Pires Em sex., 6 de mai. de 2022 às 18:21, Mohammed Naser <mnaser@vexxhost.com> escreveu:
Hi Tiago,
Have you seen this?
https://bugs.launchpad.net/nova/+bug/1969592
Mohammed
On Fri, May 6, 2022 at 3:56 PM Tiago Pires <tiagohp@gmail.com> wrote:
Hi all,
I was checking the mail list history and this thread
In my setup (OVN 20.03 and Openstack Ussuri) on the ovn-controller we have configured the ovn-remote="tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642" with the 3 OVN central member that they are in cluster mode. Also on the neutron ML2 side: [ovn] ovn_native_dhcp = True ovn_nb_connection = tcp:10.2X.4X.4:6641,tcp:10.2X.4X.68:6641,tcp:10.2X.4X.132:6641 ovn_sb_connection = tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642
We are experiencing an issue with Neutron when the OVN leader decide to take a snapshot and by design another member became leader(more less every 8 minutes): 2022-05-05T16:57:42.135Z|17401|raft|INFO|Transferring leadership to write a snapshot.
ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound 4a03 Name: OVN_Southbound Cluster ID: ca74 (ca744caf-40cd-4751-a2f2-86e35ad6541c) Server ID: 4a03 (4a0328dc-e9a4-495e-a4f1-0a0340fc6d19) Address: tcp:10.2X.4X.132:6644 Status: cluster member Role: leader Term: 1912 Leader: self Vote: self
Election timer: 10000 Log: [497643, 498261] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->3d6c ->4ef0 <-3d6c <-4ef0 Servers: 4a03 (4a03 at tcp:10.2X.4X.132:6644) (self) next_index=497874 match_index=498260 3d6c (3d6c at tcp:10.2X.4X.68:6644) next_index=498261 match_index=498260 4ef0 (4ef0 at tcp:10.2X.4X.4:6644) next_index=498261 match_index=498260
As I understood the tcp connections from the Neutron (NB) and ovn-controllers (SB) to OVN Central are established only with the leader:
#OVN central leader $ netstat -nap | grep 6642| more
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.132:6642 10.24.40.17:47278 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.24.40.76:36240 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47280 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.6:43102 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.75:58890 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.6:43108 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47142 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.71:48808 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47096 ESTABLISHED - #OVN follower 2
$ netstat -nap | grep 6642
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.76:57256 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.134:54026 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.10:34962 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.6:49238 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.135:59972 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.75:40162 ESTABLISHED - tcp 0 0 10.2X.4X.4:39566 10.2X.4X.132:6642 ESTABLISHED - #OVN follower 3
netstat -nap | grep 6642
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.70:40750 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.11:49718 ESTABLISHED - tcp 0 0 10.2X.4X.68:45632 10.2X.4X.132:6642 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.16:44816 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.7:45216 ESTABLISHED
The issue that we are experiencing is on the neutron-server that disconnects when there is the ovn leader change (due snapshot like each 8 minutes) and reconnects to the next leader. It breaks the Openstack API when someone is trying to create a VM at the same time. First, is my current configuration correct? Should the leader change and break the neutron side? Or is there some missing configuration? I was wondering if it is possible to use a LB with VIP and this VIP balance the connections to the ovn central members and I would reconfigure on the neutron side only with the VIP and also on the ovs-controllers. Does
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-March/046438.html caught my attention about raft ovsdb clustering. that make sense?
Thank you.
Regards,
Tiago Pires
-- Mohammed Naser VEXXHOST, Inc.
Sorry, I was on PTO. Jakub is right, w/o using python-ovs 2.17, when ovs breaks the connection for the leadership clients will re-download the entire content of their registered tables. With 2.17, monitor-cond-since/update3 support is added to python-ovs and it should just download the changes since they reconnected. As long as the client code handles reconnections, this reconnecting should not be an issue. It is still possible that there is code that doesn't properly handle reconnections in general, but I'd start with trying ovs 2.17. The disonnections will always happen, but they shouldn't break things. On Fri, May 6, 2022 at 5:13 PM Tiago Pires <tiagohp@gmail.com> wrote:
Hi Mohammed,
It seems a little bit like our issue.
Thank you.
Tiago Pires
Em sex., 6 de mai. de 2022 às 18:21, Mohammed Naser <mnaser@vexxhost.com> escreveu:
Hi Tiago,
Have you seen this?
https://bugs.launchpad.net/nova/+bug/1969592
Mohammed
On Fri, May 6, 2022 at 3:56 PM Tiago Pires <tiagohp@gmail.com> wrote:
Hi all,
I was checking the mail list history and this thread https://mail.openvswitch.org/pipermail/ovs-discuss/2018-March/046438.html caught my attention about raft ovsdb clustering. In my setup (OVN 20.03 and Openstack Ussuri) on the ovn-controller we have configured the ovn-remote="tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642" with the 3 OVN central member that they are in cluster mode. Also on the neutron ML2 side: [ovn] ovn_native_dhcp = True ovn_nb_connection = tcp:10.2X.4X.4:6641,tcp:10.2X.4X.68:6641,tcp:10.2X.4X.132:6641 ovn_sb_connection = tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642
We are experiencing an issue with Neutron when the OVN leader decide to take a snapshot and by design another member became leader(more less every 8 minutes): 2022-05-05T16:57:42.135Z|17401|raft|INFO|Transferring leadership to write a snapshot.
ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound 4a03 Name: OVN_Southbound Cluster ID: ca74 (ca744caf-40cd-4751-a2f2-86e35ad6541c) Server ID: 4a03 (4a0328dc-e9a4-495e-a4f1-0a0340fc6d19) Address: tcp:10.2X.4X.132:6644 Status: cluster member Role: leader Term: 1912 Leader: self Vote: self
Election timer: 10000 Log: [497643, 498261] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->3d6c ->4ef0 <-3d6c <-4ef0 Servers: 4a03 (4a03 at tcp:10.2X.4X.132:6644) (self) next_index=497874 match_index=498260 3d6c (3d6c at tcp:10.2X.4X.68:6644) next_index=498261 match_index=498260 4ef0 (4ef0 at tcp:10.2X.4X.4:6644) next_index=498261 match_index=498260
As I understood the tcp connections from the Neutron (NB) and ovn-controllers (SB) to OVN Central are established only with the leader:
#OVN central leader $ netstat -nap | grep 6642| more
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.132:6642 10.24.40.17:47278 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.24.40.76:36240 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47280 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.6:43102 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.75:58890 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.6:43108 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47142 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.71:48808 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47096 ESTABLISHED - #OVN follower 2
$ netstat -nap | grep 6642
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.76:57256 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.134:54026 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.10:34962 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.6:49238 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.135:59972 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.75:40162 ESTABLISHED - tcp 0 0 10.2X.4X.4:39566 10.2X.4X.132:6642 ESTABLISHED - #OVN follower 3
netstat -nap | grep 6642
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.70:40750 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.11:49718 ESTABLISHED - tcp 0 0 10.2X.4X.68:45632 10.2X.4X.132:6642 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.16:44816 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.7:45216 ESTABLISHED
The issue that we are experiencing is on the neutron-server that disconnects when there is the ovn leader change (due snapshot like each 8 minutes) and reconnects to the next leader. It breaks the Openstack API when someone is trying to create a VM at the same time. First, is my current configuration correct? Should the leader change and break the neutron side? Or is there some missing configuration? I was wondering if it is possible to use a LB with VIP and this VIP balance the connections to the ovn central members and I would reconfigure on the neutron side only with the VIP and also on the ovs-controllers. Does that make sense?
Thank you.
Regards,
Tiago Pires
-- Mohammed Naser VEXXHOST, Inc.
Hi all, Thanks Terry. As I'm using an old OVS (2.13)/OVN(20.3) version due openstack release (Ussuri), would it be possible to upgrade only the OVN/OVS without upgrading the whole openstack? I'm only checking options in this case, how are you guys dealing with this in production? Regards, Tiago Pires Em seg., 9 de mai. de 2022 às 10:45, Terry Wilson <twilson@redhat.com> escreveu:
Sorry, I was on PTO. Jakub is right, w/o using python-ovs 2.17, when ovs breaks the connection for the leadership clients will re-download the entire content of their registered tables. With 2.17, monitor-cond-since/update3 support is added to python-ovs and it should just download the changes since they reconnected. As long as the client code handles reconnections, this reconnecting should not be an issue. It is still possible that there is code that doesn't properly handle reconnections in general, but I'd start with trying ovs 2.17. The disonnections will always happen, but they shouldn't break things.
On Fri, May 6, 2022 at 5:13 PM Tiago Pires <tiagohp@gmail.com> wrote:
Hi Mohammed,
It seems a little bit like our issue.
Thank you.
Tiago Pires
Em sex., 6 de mai. de 2022 às 18:21, Mohammed Naser <mnaser@vexxhost.com>
Hi Tiago,
Have you seen this?
https://bugs.launchpad.net/nova/+bug/1969592
Mohammed
On Fri, May 6, 2022 at 3:56 PM Tiago Pires <tiagohp@gmail.com> wrote:
Hi all,
I was checking the mail list history and this thread
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-March/046438.html caught my attention about raft ovsdb clustering.
In my setup (OVN 20.03 and Openstack Ussuri) on the ovn-controller we have configured the ovn-remote="tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642" with the 3 OVN central member that they are in cluster mode. Also on the neutron ML2 side: [ovn] ovn_native_dhcp = True ovn_nb_connection = tcp:10.2X.4X.4:6641,tcp:10.2X.4X.68:6641,tcp:10.2X.4X.132:6641 ovn_sb_connection = tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642
We are experiencing an issue with Neutron when the OVN leader decide to take a snapshot and by design another member became leader(more less every 8 minutes): 2022-05-05T16:57:42.135Z|17401|raft|INFO|Transferring leadership to write a snapshot.
ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound 4a03 Name: OVN_Southbound Cluster ID: ca74 (ca744caf-40cd-4751-a2f2-86e35ad6541c) Server ID: 4a03 (4a0328dc-e9a4-495e-a4f1-0a0340fc6d19) Address: tcp:10.2X.4X.132:6644 Status: cluster member Role: leader Term: 1912 Leader: self Vote: self
Election timer: 10000 Log: [497643, 498261] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->3d6c ->4ef0 <-3d6c <-4ef0 Servers: 4a03 (4a03 at tcp:10.2X.4X.132:6644) (self) next_index=497874 match_index=498260 3d6c (3d6c at tcp:10.2X.4X.68:6644) next_index=498261 match_index=498260 4ef0 (4ef0 at tcp:10.2X.4X.4:6644) next_index=498261 match_index=498260
As I understood the tcp connections from the Neutron (NB) and ovn-controllers (SB) to OVN Central are established only with the leader:
#OVN central leader $ netstat -nap | grep 6642| more
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.132:6642 10.24.40.17:47278 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.24.40.76:36240 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47280 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.6:43102 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.75:58890 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.6:43108 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47142 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.71:48808 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47096 ESTABLISHED - #OVN follower 2
$ netstat -nap | grep 6642
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.76:57256 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.134:54026 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.10:34962 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.6:49238 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.135:59972 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.75:40162 ESTABLISHED - tcp 0 0 10.2X.4X.4:39566 10.2X.4X.132:6642 ESTABLISHED - #OVN follower 3
netstat -nap | grep 6642
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.70:40750 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.11:49718 ESTABLISHED - tcp 0 0 10.2X.4X.68:45632 10.2X.4X.132:6642 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.16:44816 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.7:45216 ESTABLISHED
The issue that we are experiencing is on the neutron-server that disconnects when there is the ovn leader change (due snapshot like each 8 minutes) and reconnects to the next leader. It breaks the Openstack API when someone is trying to create a VM at the same time. First, is my current configuration correct? Should the leader change and break the neutron side? Or is there some missing configuration? I was wondering if it is possible to use a LB with VIP and this VIP balance the connections to the ovn central members and I would reconfigure on the neutron side only with the VIP and also on the ovs-controllers. Does
escreveu: that make sense?
Thank you.
Regards,
Tiago Pires
-- Mohammed Naser VEXXHOST, Inc.
On Mon, May 9, 2022 at 9:53 PM Tiago Pires <tiagohp@gmail.com> wrote:
Hi all,
Thanks Terry. As I'm using an old OVS (2.13)/OVN(20.3) version due openstack release (Ussuri), would it be possible to upgrade only the OVN/OVS without upgrading the whole openstack? I'm only checking options in this case, how are you guys dealing with this in production?
We are currently looking into the possibility of providing an OVN enablement PPA of some sort, which would allow you to upgrade just the OVS/OVN components to 2.17/22.03. There is outstanding work before that can be done successfully. There are many patches to OpenStack components to deal with the change of behavior that OVS 2.17 brings, both for the Python IDL API changes and the more frequent OVSDB Server leader changes. Many patches have already made it into the stable/ussuri branch, and there are some more to go (this one is ready in-flight [0], but there may be more required) before this will work. Validation of this combination is ongoing, and if it is successful we will most likely request a new Neutron Ussuri point release as well as providing the above mentioned PPA. We'll know more over the course of the next couple of weeks. As always, the goodness of OpenStack Yoga and all the latest OVS/OVN bits is already available in the most recent release. 0: https://review.opendev.org/c/openstack/neutron/+/840744 -- Frode Nordahl
Regards,
Tiago Pires
Em seg., 9 de mai. de 2022 às 10:45, Terry Wilson <twilson@redhat.com> escreveu:
Sorry, I was on PTO. Jakub is right, w/o using python-ovs 2.17, when ovs breaks the connection for the leadership clients will re-download the entire content of their registered tables. With 2.17, monitor-cond-since/update3 support is added to python-ovs and it should just download the changes since they reconnected. As long as the client code handles reconnections, this reconnecting should not be an issue. It is still possible that there is code that doesn't properly handle reconnections in general, but I'd start with trying ovs 2.17. The disonnections will always happen, but they shouldn't break things.
On Fri, May 6, 2022 at 5:13 PM Tiago Pires <tiagohp@gmail.com> wrote:
Hi Mohammed,
It seems a little bit like our issue.
Thank you.
Tiago Pires
Em sex., 6 de mai. de 2022 às 18:21, Mohammed Naser <mnaser@vexxhost.com> escreveu:
Hi Tiago,
Have you seen this?
https://bugs.launchpad.net/nova/+bug/1969592
Mohammed
On Fri, May 6, 2022 at 3:56 PM Tiago Pires <tiagohp@gmail.com> wrote:
Hi all,
I was checking the mail list history and this thread https://mail.openvswitch.org/pipermail/ovs-discuss/2018-March/046438.html caught my attention about raft ovsdb clustering. In my setup (OVN 20.03 and Openstack Ussuri) on the ovn-controller we have configured the ovn-remote="tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642" with the 3 OVN central member that they are in cluster mode. Also on the neutron ML2 side: [ovn] ovn_native_dhcp = True ovn_nb_connection = tcp:10.2X.4X.4:6641,tcp:10.2X.4X.68:6641,tcp:10.2X.4X.132:6641 ovn_sb_connection = tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642
We are experiencing an issue with Neutron when the OVN leader decide to take a snapshot and by design another member became leader(more less every 8 minutes): 2022-05-05T16:57:42.135Z|17401|raft|INFO|Transferring leadership to write a snapshot.
ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound 4a03 Name: OVN_Southbound Cluster ID: ca74 (ca744caf-40cd-4751-a2f2-86e35ad6541c) Server ID: 4a03 (4a0328dc-e9a4-495e-a4f1-0a0340fc6d19) Address: tcp:10.2X.4X.132:6644 Status: cluster member Role: leader Term: 1912 Leader: self Vote: self
Election timer: 10000 Log: [497643, 498261] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->3d6c ->4ef0 <-3d6c <-4ef0 Servers: 4a03 (4a03 at tcp:10.2X.4X.132:6644) (self) next_index=497874 match_index=498260 3d6c (3d6c at tcp:10.2X.4X.68:6644) next_index=498261 match_index=498260 4ef0 (4ef0 at tcp:10.2X.4X.4:6644) next_index=498261 match_index=498260
As I understood the tcp connections from the Neutron (NB) and ovn-controllers (SB) to OVN Central are established only with the leader:
#OVN central leader $ netstat -nap | grep 6642| more
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.132:6642 10.24.40.17:47278 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.24.40.76:36240 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47280 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.6:43102 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.75:58890 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.6:43108 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47142 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.71:48808 ESTABLISHED - tcp 0 0 10.2X.4X.132:6642 10.2X.4X.17:47096 ESTABLISHED - #OVN follower 2
$ netstat -nap | grep 6642
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.76:57256 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.134:54026 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.10:34962 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.6:49238 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.135:59972 ESTABLISHED - tcp 0 0 10.2X.4X.4:6642 10.2X.4X.75:40162 ESTABLISHED - tcp 0 0 10.2X.4X.4:39566 10.2X.4X.132:6642 ESTABLISHED - #OVN follower 3
netstat -nap | grep 6642
tcp 0 0 0.0.0.0:6642 0.0.0.0:* LISTEN - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.70:40750 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.11:49718 ESTABLISHED - tcp 0 0 10.2X.4X.68:45632 10.2X.4X.132:6642 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.16:44816 ESTABLISHED - tcp 0 0 10.2X.4X.68:6642 10.2X.4X.7:45216 ESTABLISHED
The issue that we are experiencing is on the neutron-server that disconnects when there is the ovn leader change (due snapshot like each 8 minutes) and reconnects to the next leader. It breaks the Openstack API when someone is trying to create a VM at the same time. First, is my current configuration correct? Should the leader change and break the neutron side? Or is there some missing configuration? I was wondering if it is possible to use a LB with VIP and this VIP balance the connections to the ovn central members and I would reconfigure on the neutron side only with the VIP and also on the ovs-controllers. Does that make sense?
Thank you.
Regards,
Tiago Pires
-- Mohammed Naser VEXXHOST, Inc.
participants (4)
-
Frode Nordahl
-
Mohammed Naser
-
Terry Wilson
-
Tiago Pires