[openstack-dev] [TripleO][OVN] Switching the default network backend to ML2/OVN
Miguel Angel Ajo Pelayo
majopela at redhat.com
Thu Oct 25 09:22:18 UTC 2018
Daniel, thank you very much for the extensive and detailed email.
The plan looks good to me and it makes sense, also the OVS option will
still be
tested, and available when selected.
On Wed, Oct 24, 2018 at 4:41 PM Daniel Alvarez Sanchez <dalvarez at redhat.com>
wrote:
> Hi Stackers!
>
> The purpose of this email is to share with the community the intention
> of switching the default network backend in TripleO from ML2/OVS to
> ML2/OVN by changing the mechanism driver from openvswitch to ovn. This
> doesn’t mean that ML2/OVS will be dropped but users deploying
> OpenStack without explicitly specifying a network driver will get
> ML2/OVN by default.
>
> OVN in Short
> ==========
>
> Open Virtual Network is managed under the OVS project, and was created
> by the original authors of OVS. It is an attempt to re-do the ML2/OVS
> control plane, using lessons learned throughout the years. It is
> intended to be used in projects such as OpenStack and Kubernetes.
Also oVirt / RHEV.
> OVN
> has a different architecture, moving us away from Python agents
> communicating with the Neutron API service via RabbitMQ to daemons
> written in C communicating via OpenFlow and OVSDB.
>
> OVN is built with a modern architecture that offers better foundations
> for a simpler and more performant solution. What does this mean? For
> example, at Red Hat we executed some preliminary testing during the
> Queens cycle and found significant CPU savings due to OVN not using
> RabbitMQ (CPU utilization during a Rally scenario using ML2/OVS [0] or
> ML2/OVN [1]). Also, we tested API performance and found out that most
> of the operations are significantly faster with ML2/OVN. Please see
> more details in the FAQ section.
>
> Here’s a few useful links about OpenStack’s integration of OVN:
>
> * OpenStack Boston Summit talk on OVN [2]
> * OpenStack networking-ovn documentation [3]
> * OpenStack networking-ovn code repository [4]
>
> How?
> ====
>
> The goal is to merge this patch [5] during the Stein cycle which
> pursues the following actions:
>
> 1. Switch the default mechanism driver from openvswitch to ovn.
> 2. Adapt all jobs so that they use ML2/OVN as the network backend.
> 3. Create legacy environment file for ML2/OVS to allow deployments based
> on it.
> 4. Flip scenario007 job from ML2/OVN to ML2/OVS so that we continue
> testing it.
> 5. Continue using ML2/OVS in the undercloud.
> 6. Ensure that updates/upgrades from ML2/OVS don’t break and don’t
> switch automatically to the new default. As some parity gaps exist
> right now, we don’t want to change the network backend automatically.
> Instead, if the user wants to migrate from ML2/OVS to ML2/OVN, we’ll
> provide an ansible based tool that will perform the operation.
> More info and code at [6].
>
> Reviews, comments and suggestions are really appreciated :)
>
>
> FAQ
> ===
>
> Can you talk about the advantages of OVN over ML2/OVS?
>
> -------------------------------------------------------------------------------
>
> If asked to describe the ML2/OVS control plane (OVS, L3, DHCP and
> metadata agents using the messaging bus to sync with the Neutron API
> service) one would not tend to use the term ‘simple’. There is liberal
> use of a smattering of Linux networking technologies such as:
> * iptables
> * network namespaces
> * ARP manipulation
> * Different forms of NAT
> * keepalived, radvd, haproxy, dnsmasq
> * Source based routing,
> * … and of course OVS flows.
>
> OVN simplifies this to a single process running on compute nodes, and
> another process running on centralized nodes, communicating via OVSDB
> and OpenFlow, ultimately setting OVS flows.
>
> The simplified, new architecture allows us to re-do features like DVR
> and L3 HA in more efficient and elegant ways. For example, L3 HA
> failover is faster: It doesn’t use keepalived, rather OVN monitors
> neighbor tunnel endpoints. OVN supports enabling both DVR and L3 HA
> simultaneously, something we never supported with ML2/OVS.
>
> We also found out that not depending on RPC messages for agents
> communication brings a lot of benefits. From our experience, RabbitMQ
> sometimes represents a bottleneck and it can be very intense when it
> comes to resources utilization.
>
>
> What about the undercloud?
> --------------------------------------
>
> ML2/OVS will be still used in the undercloud as OVN has some
> limitations with regards to baremetal provisioning mainly (keep
> reading about the parity gaps). We aim to convert the undercloud to
> ML2/OVN to provide the operator a more consistent experience as soon
> as possible.
>
> It would be possible however to use the Neutron DHCP agent in the
> short term to solve this limitation but in the long term we intend to
> implement support for baremetal provisioning in the OVN built-in DHCP
> server.
>
>
> What about CI?
> ---------------------
>
> * networking-ovn has:
> * Devstack based Tempest (API, scenario from Tempest and Neutron
> Tempest plugin) against the latest released OVS version, and against
> OVS master (thus also OVN master)
> * Devstack based Rally
> * Grenade
> * A multinode, container based TripleO job that installs and issues a
> basic VM connectivity scenario test
> * Supports Python 3 and 2
> * TripleO has currently OVN enabled in one quickstart featureset (fs30).
>
> Are there any known parity issues with ML2/OVS?
> -------------------------------------------------------------------
>
> * OVN supports VLAN provider networks, but not VLAN tenant networks.
> This will be addressed and is being tracked in RHBZ 1561880 [7]
> * SRIOV: A limitation exists for this scenario where OVN needs to
> support VLAN tenant networks and Neutron DHCP Agent has to be
> deployed. The goal is to include support in OVN to get rid of Neutron
> DHCP agent. [8]
> * QoS: Lack of support for DSCP marking and egress bandwidth limiting
> RHBZ 1503494 [9]
> * OVN does not presently support the new Security Groups logging API
> RHBZ 1619266 [10]
> * OVN does not correctly support Jumbo frames for North/South traffic
> RHBZ 1547074 [11]
> * OVN built-in DHCP server currently can not be used to provision
> baremetal nodes (RHBZ 1622154 [12]) (this affects the undercloud and
> overcloud’s baremetal-to-tenant use case).
> * End-to-end encryption support in TripleO (RHBZ 1601926 [13])
>
> More info at [14].
>
>
> How does the performance look like?
> -------------------------------------------------
>
> We have carried out different performance tests. Overall, ML2/OVN
> outperforms ML2/OVS in most of the operations as this graph [15]
> shows.
> Only creating networks and listing ports are slower which is mostly
> due to the fact that ML2/OVN creates an extra port (for metadata) upon
> network creation so the amount of ports listed for the same rally task
> is 2x for the ML2/OVN case.
>
> Also, the resources utilization is lower in ML2/OVN [16] vs ML2/OVS
> [17] mainly due to the lack of agents and not using RPC.
>
> OVN only supports VLAN and Geneve (tunneled) networks, while ML2/OVS
> uses VXLAN. What, if any, is the impact? What about hardware offload?
>
> -----------------------------------------------------------------------------------------------------
>
> Good question! We asked this ourselves, and research showed that this
> is not a problem. Normally, NICs that support VXLAN also support
> Geneve hardware offload. Interestingly, even in the cases where they
> don’t, performance was found to be better using Geneve due to other
> optimizations that Geneve benefits from. More information can be found
> in Russell’s Bryant blog [18], who did extensive work in this space.
>
>
> Links
> ====
>
> [0] https://imgur.com/a/oOmuAqj
> [1] https://imgur.com/a/N9jrIXV
> [2] https://www.youtube.com/watch?v=sgc7myiX6ts
> [3] https://docs.openstack.org/networking-ovn/queens/admin/index.html
> [4] https://github.com/openstack/networking-ovn
> [5] https://review.openstack.org/#/c/593056/
> [6] https://github.com/openstack/networking-ovn/tree/master/migration
> [7] https://bugzilla.redhat.com/show_bug.cgi?id=1561880
> [8]
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
> [9] https://bugzilla.redhat.com/show_bug.cgi?id=1503494
> [10] https://bugzilla.redhat.com/show_bug.cgi?id= 1619266
> [11] https://bugzilla.redhat.com/show_bug.cgi?id= 1547074
> [12] https://bugzilla.redhat.com/show_bug.cgi?id= 1622154
> [13] https://bugzilla.redhat.com/show_bug.cgi?id= 1601926
> [14] https://wiki.openstack.org/wiki/Networking-ovn
> [15] https://imgur.com/a/4QtaN6b
> [16] https://imgur.com/a/N9jrIXV
> [17] https://imgur.com/a/oOmuAqj
> [18]
> https://blog.russellbryant.net/2017/05/30/ovn-geneve-vs-vxlan-does-it-matter/
>
>
> Thanks!
> Daniel Alvarez
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
--
Miguel Ángel Ajo
OSP / Networking DFG, OVN Squad Engineering
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20181025/2bf6ec32/attachment.html>
More information about the OpenStack-dev
mailing list