[openstack-dev] [TripleO][OVN] Switching the default network backend to ML2/OVN

Daniel Alvarez Sanchez dalvarez at redhat.com
Wed Oct 24 14:40:17 UTC 2018

Hi Stackers!

The purpose of this email is to share with the community the intention
of switching the default network backend in TripleO from ML2/OVS to
ML2/OVN by changing the mechanism driver from openvswitch to ovn. This
doesn’t mean that ML2/OVS will be dropped but users deploying
OpenStack without explicitly specifying a network driver will get
ML2/OVN by default.

OVN in Short

Open Virtual Network is managed under the OVS project, and was created
by the original authors of OVS. It is an attempt to re-do the ML2/OVS
control plane, using lessons learned throughout the years. It is
intended to be used in projects such as OpenStack and Kubernetes. OVN
has a different architecture, moving us away from Python agents
communicating with the Neutron API service via RabbitMQ to daemons
written in C communicating via OpenFlow and OVSDB.

OVN is built with a modern architecture that offers better foundations
for a simpler and more performant solution. What does this mean? For
example, at Red Hat we executed some preliminary testing during the
Queens cycle and found significant CPU savings due to OVN not using
RabbitMQ (CPU utilization during a Rally scenario using ML2/OVS [0] or
ML2/OVN [1]). Also, we tested API performance and found out that most
of the operations are significantly faster with ML2/OVN. Please see
more details in the FAQ section.

Here’s a few useful links about OpenStack’s integration of OVN:

* OpenStack Boston Summit talk on OVN [2]
* OpenStack networking-ovn documentation [3]
* OpenStack networking-ovn code repository [4]


The goal is to merge this patch [5] during the Stein cycle which
pursues the following actions:

1. Switch the default mechanism driver from openvswitch to ovn.
2. Adapt all jobs so that they use ML2/OVN as the network backend.
3. Create legacy environment file for ML2/OVS to allow deployments based on it.
4. Flip scenario007 job from ML2/OVN to ML2/OVS so that we continue testing it.
5. Continue using ML2/OVS in the undercloud.
6. Ensure that updates/upgrades from ML2/OVS don’t break and don’t
switch automatically to the new default. As some parity gaps exist
right now, we don’t want to change the network backend automatically.
Instead, if the user wants to migrate from ML2/OVS to ML2/OVN, we’ll
provide an ansible based tool that will perform the operation.
More info and code at [6].

Reviews, comments and suggestions are really appreciated :)


Can you talk about the advantages of OVN over ML2/OVS?

If asked to describe the ML2/OVS control plane (OVS, L3, DHCP and
metadata agents using the messaging bus to sync with the Neutron API
service) one would not tend to use the term ‘simple’. There is liberal
use of a smattering of Linux networking technologies such as:
* iptables
* network namespaces
* ARP manipulation
* Different forms of NAT
* keepalived, radvd, haproxy, dnsmasq
* Source based routing,
* … and of course OVS flows.

OVN simplifies this to a single process running on compute nodes, and
another process running on centralized nodes, communicating via OVSDB
and OpenFlow, ultimately setting OVS flows.

The simplified, new architecture allows us to re-do features like DVR
and L3 HA in more efficient and elegant ways. For example, L3 HA
failover is faster: It doesn’t use keepalived, rather OVN monitors
neighbor tunnel endpoints. OVN supports enabling both DVR and L3 HA
simultaneously, something we never supported with ML2/OVS.

We also found out that not depending on RPC messages for agents
communication brings a lot of benefits. From our experience, RabbitMQ
sometimes represents a bottleneck and it can be very intense when it
comes to resources utilization.

What about the undercloud?

ML2/OVS will be still used in the undercloud as OVN has some
limitations with regards to baremetal provisioning mainly (keep
reading about the parity gaps). We aim to convert the undercloud to
ML2/OVN to provide the operator a more consistent experience as soon
as possible.

It would be possible however to use the Neutron DHCP agent in the
short term to solve this limitation but in the long term we intend to
implement support for baremetal provisioning in the OVN built-in DHCP

What about CI?

* networking-ovn has:
* Devstack based Tempest (API, scenario from Tempest and Neutron
Tempest plugin) against the latest released OVS version, and against
OVS master (thus also OVN master)
* Devstack based Rally
* Grenade
* A multinode, container based TripleO job that installs and issues a
basic VM connectivity scenario test
* Supports Python 3 and 2
* TripleO has currently OVN enabled in one quickstart featureset (fs30).

Are there any known parity issues with ML2/OVS?

* OVN supports VLAN provider networks, but not VLAN tenant networks.
This will be addressed and is being tracked in RHBZ 1561880 [7]
* SRIOV: A limitation exists for this scenario where OVN needs to
support VLAN tenant networks and Neutron DHCP Agent has to be
deployed. The goal is to include support in OVN to get rid of Neutron
DHCP agent. [8]
* QoS: Lack of support for DSCP marking and egress bandwidth limiting
RHBZ 1503494 [9]
* OVN does not presently support the new Security Groups logging API
RHBZ 1619266 [10]
* OVN does not correctly support Jumbo frames for North/South traffic
RHBZ 1547074 [11]
* OVN built-in DHCP server currently can not be used to provision
baremetal nodes (RHBZ 1622154 [12]) (this affects the undercloud and
overcloud’s baremetal-to-tenant use case).
* End-to-end encryption support in TripleO (RHBZ 1601926 [13])

More info at [14].

How does the performance look like?

We have carried out different performance tests. Overall, ML2/OVN
outperforms ML2/OVS in most of the operations as this graph [15]
Only creating networks and listing ports are slower which is mostly
due to the fact that ML2/OVN creates an extra port (for metadata) upon
network creation so the amount of ports listed for the same rally task
is 2x for the ML2/OVN case.

Also, the resources utilization is lower in ML2/OVN [16] vs ML2/OVS
[17] mainly due to the lack of agents and not using RPC.

OVN only supports VLAN and Geneve (tunneled) networks, while ML2/OVS
uses VXLAN. What, if any, is the impact? What about hardware offload?

Good question! We asked this ourselves, and research showed that this
is not a problem. Normally, NICs that support VXLAN also support
Geneve hardware offload. Interestingly, even in the cases where they
don’t, performance was found to be better using Geneve due to other
optimizations that Geneve benefits from. More information can be found
in Russell’s Bryant blog [18], who did extensive work in this space.


[0] https://imgur.com/a/oOmuAqj
[1] https://imgur.com/a/N9jrIXV
[2] https://www.youtube.com/watch?v=sgc7myiX6ts
[3] https://docs.openstack.org/networking-ovn/queens/admin/index.html
[4] https://github.com/openstack/networking-ovn
[5] https://review.openstack.org/#/c/593056/
[6] https://github.com/openstack/networking-ovn/tree/master/migration
[7] https://bugzilla.redhat.com/show_bug.cgi?id=1561880
[8] https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
[9] https://bugzilla.redhat.com/show_bug.cgi?id=1503494
[10] https://bugzilla.redhat.com/show_bug.cgi?id= 1619266
[11] https://bugzilla.redhat.com/show_bug.cgi?id= 1547074
[12] https://bugzilla.redhat.com/show_bug.cgi?id= 1622154
[13] https://bugzilla.redhat.com/show_bug.cgi?id= 1601926
[14] https://wiki.openstack.org/wiki/Networking-ovn
[15] https://imgur.com/a/4QtaN6b
[16] https://imgur.com/a/N9jrIXV
[17] https://imgur.com/a/oOmuAqj
[18] https://blog.russellbryant.net/2017/05/30/ovn-geneve-vs-vxlan-does-it-matter/

Daniel Alvarez

More information about the OpenStack-dev mailing list