Neutron + OVN raft cluster

Frode Nordahl frode.nordahl at canonical.com
Tue May 10 07:22:40 UTC 2022


On Mon, May 9, 2022 at 9:53 PM Tiago Pires <tiagohp at gmail.com> wrote:
>
> Hi all,
>
> Thanks Terry.
> As I'm using an old OVS (2.13)/OVN(20.3) version due openstack release (Ussuri), would it be possible to upgrade only the OVN/OVS without upgrading the whole openstack?
> I'm only checking options in this case, how are you guys dealing with this in production?

We are currently looking into the possibility of providing an OVN
enablement PPA of some sort, which would allow you to upgrade just the
OVS/OVN components to 2.17/22.03. There is outstanding work before
that can be done successfully.

There are many patches to OpenStack components to deal with the change
of behavior that OVS 2.17 brings, both for the Python IDL API changes
and the more frequent OVSDB Server leader changes. Many patches have
already made it into the stable/ussuri branch, and there are some more
to go (this one is ready in-flight [0], but there may be more
required) before this will work.

Validation of this combination is ongoing, and if it is successful we
will most likely request a new Neutron Ussuri point release as well as
providing the above mentioned PPA. We'll know more over the course of
the next couple of weeks.


As always, the goodness of OpenStack Yoga and all the latest OVS/OVN
bits is already available in the most recent release.

0: https://review.opendev.org/c/openstack/neutron/+/840744

-- 
Frode Nordahl

> Regards,
>
> Tiago Pires
>
> Em seg., 9 de mai. de 2022 às 10:45, Terry Wilson <twilson at redhat.com> escreveu:
>>
>> Sorry, I was on PTO. Jakub is right, w/o using python-ovs 2.17, when
>> ovs breaks the connection for the leadership clients will re-download
>> the entire content of their registered tables. With 2.17,
>> monitor-cond-since/update3 support is added to python-ovs and it
>> should just download the changes since they reconnected. As long as
>> the client code handles reconnections, this reconnecting should not be
>> an issue. It is still possible that there is code that doesn't
>> properly handle reconnections in general, but I'd start with trying
>> ovs 2.17. The disonnections will always happen, but they shouldn't
>> break things.
>>
>> On Fri, May 6, 2022 at 5:13 PM Tiago Pires <tiagohp at gmail.com> wrote:
>> >
>> > Hi Mohammed,
>> >
>> > It seems a little bit like our issue.
>> >
>> > Thank you.
>> >
>> > Tiago Pires
>> >
>> > Em sex., 6 de mai. de 2022 às 18:21, Mohammed Naser <mnaser at vexxhost.com> escreveu:
>> >>
>> >> Hi Tiago,
>> >>
>> >> Have you seen this?
>> >>
>> >> https://bugs.launchpad.net/nova/+bug/1969592
>> >>
>> >> Mohammed
>> >>
>> >> On Fri, May 6, 2022 at 3:56 PM Tiago Pires <tiagohp at gmail.com> wrote:
>> >> >
>> >> > Hi all,
>> >> >
>> >> > I was checking the mail list history and this thread https://mail.openvswitch.org/pipermail/ovs-discuss/2018-March/046438.html caught my attention about raft ovsdb clustering.
>> >> > In my setup (OVN 20.03 and Openstack Ussuri) on the ovn-controller we have configured the ovn-remote="tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642" with the 3 OVN central member that they are in cluster mode.
>> >> > Also on the neutron ML2 side:
>> >> > [ovn]
>> >> > ovn_native_dhcp = True
>> >> > ovn_nb_connection = tcp:10.2X.4X.4:6641,tcp:10.2X.4X.68:6641,tcp:10.2X.4X.132:6641
>> >> > ovn_sb_connection = tcp:10.2X.4X.4:6642,tcp:10.2X.4X.68:6642,tcp:10.2X.4X.132:6642
>> >> >
>> >> > We are experiencing an issue with Neutron when the OVN leader decide to take a snapshot and by design another member became leader(more less every 8 minutes):
>> >> > 2022-05-05T16:57:42.135Z|17401|raft|INFO|Transferring leadership to write a snapshot.
>> >> >
>> >> > ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound
>> >> > 4a03
>> >> > Name: OVN_Southbound
>> >> > Cluster ID: ca74 (ca744caf-40cd-4751-a2f2-86e35ad6541c)
>> >> > Server ID: 4a03 (4a0328dc-e9a4-495e-a4f1-0a0340fc6d19)
>> >> > Address: tcp:10.2X.4X.132:6644
>> >> > Status: cluster member
>> >> > Role: leader
>> >> > Term: 1912
>> >> > Leader: self
>> >> > Vote: self
>> >> >
>> >> > Election timer: 10000
>> >> > Log: [497643, 498261]
>> >> > Entries not yet committed: 0
>> >> > Entries not yet applied: 0
>> >> > Connections: ->3d6c ->4ef0 <-3d6c <-4ef0
>> >> > Servers:
>> >> >     4a03 (4a03 at tcp:10.2X.4X.132:6644) (self) next_index=497874 match_index=498260
>> >> >     3d6c (3d6c at tcp:10.2X.4X.68:6644) next_index=498261 match_index=498260
>> >> >     4ef0 (4ef0 at tcp:10.2X.4X.4:6644) next_index=498261 match_index=498260
>> >> >
>> >> > As I understood the tcp connections from the Neutron (NB) and ovn-controllers (SB) to OVN Central are established only with the leader:
>> >> >
>> >> > #OVN central leader
>> >> > $ netstat -nap | grep 6642| more
>> >> >
>> >> > tcp        0      0 0.0.0.0:6642            0.0.0.0:*               LISTEN      -
>> >> > tcp        0      0 10.2X.4X.132:6642       10.24.40.17:47278       ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.132:6642       10.24.40.76:36240       ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.132:6642       10.2X.4X.17:47280       ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.132:6642       10.2X.4X.6:43102        ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.132:6642       10.2X.4X.75:58890       ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.132:6642       10.2X.4X.6:43108        ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.132:6642       10.2X.4X.17:47142       ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.132:6642       10.2X.4X.71:48808       ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.132:6642       10.2X.4X.17:47096       ESTABLISHED -
>> >> > #OVN follower 2
>> >> >
>> >> > $ netstat -nap | grep 6642
>> >> >
>> >> > tcp        0      0 0.0.0.0:6642            0.0.0.0:*               LISTEN      -
>> >> > tcp        0      0 10.2X.4X.4:6642         10.2X.4X.76:57256       ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.4:6642         10.2X.4X.134:54026      ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.4:6642         10.2X.4X.10:34962       ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.4:6642         10.2X.4X.6:49238        ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.4:6642         10.2X.4X.135:59972      ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.4:6642         10.2X.4X.75:40162       ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.4:39566        10.2X.4X.132:6642       ESTABLISHED -
>> >> > #OVN follower 3
>> >> >
>> >> > netstat -nap | grep 6642
>> >> >
>> >> > tcp        0      0 0.0.0.0:6642            0.0.0.0:*               LISTEN      -
>> >> > tcp        0      0 10.2X.4X.68:6642        10.2X.4X.70:40750       ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.68:6642        10.2X.4X.11:49718       ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.68:45632       10.2X.4X.132:6642       ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.68:6642        10.2X.4X.16:44816       ESTABLISHED -
>> >> > tcp        0      0 10.2X.4X.68:6642        10.2X.4X.7:45216        ESTABLISHED
>> >> >
>> >> > The issue that we are experiencing is on the neutron-server that disconnects when there is the ovn leader change (due snapshot like each 8 minutes) and reconnects to the next leader. It breaks the Openstack API when someone is trying to create a VM at the same time.
>> >> > First, is my current configuration correct? Should the leader change and break the neutron side? Or is there some missing configuration?
>> >> > I was wondering if it is possible to use a LB with VIP and this VIP balance the connections to the ovn central members and I would reconfigure on the neutron side only with the VIP and also on the ovs-controllers. Does that make sense?
>> >> >
>> >> > Thank you.
>> >> >
>> >> > Regards,
>> >> >
>> >> > Tiago Pires
>> >>
>> >>
>> >>
>> >> --
>> >> Mohammed Naser
>> >> VEXXHOST, Inc.
>>



More information about the openstack-discuss mailing list