[neutron][ptg] PTG summary

Rodolfo Alonso Hernandez ralonsoh at redhat.com
Tue Jun 20 10:43:28 UTC 2023


Hello all:

Thank you for joining us during the Forum session and the PTG slots. It was
very productive and we were able to check the pulse of OpenStack and
Neutron in particular.

This time both the Forum session and the PTG were attended mainly by
operators and users; the number of developers was reduced and this is
something we need to improve. A healthy community is nourished by the
active participation of their members, not only pushing new code but
reviewing and helping fixing the new bugs.

=== Forum session ===
That was a good surprise for me to find so many people in the room. I
wasn't expecting that after checking the etherpad link provided before the
meeting. During this short session, focused on issues/pain points of
Neutron, we could find out that:
* The ML2/OVN backend is increasing its adoption rate.
* Most users have deployments between Wallaby and Antelope.
* Hardware offload is increasing its demand due to the network BW
increasing requirements.
* BGP/eVPN is deployed in most of the environments.

>From this session I would highlight some lessons learned for next events:
* If the Forum session time slot is reduced (30 mins), the attendant
interventions should be limited to 2-3 minutes, enough to describe the
issue. Any further discussion should be taken in the PTG sessions.
* Request attendance "checkin" in the etherpad, to be prepared for the
event.
* Request people to add their questions in the etherpad in advance.

=== PTG ===
We didn't have technical sessions but many questions about issues and
possible bugs. These are the most relevant ones:

*** OVN L3 scheduler issue ***
This issue has been reproduced in an environment with more than 5 chassis
with gateway ports. The router GW ports are assigned to the GW chassis
using a manual scheduler implemented in Neutron (the default one is
``OVNGatewayLeastLoadedScheduler``). If one of the chassis is stopped, the
GW ports should be re-assigned to the other GW chassis. This is happening
but all ports fall under the same one; this re-scheduling should share the
ports among the other active chassis.
* Action item: I'll open a LP bug and investigate this issue.

*** Size of the OVN SB "HA_Chassis_Group" table ***
The OVN SB "HA_Chassis_Group" increases its size indefinitely with each
operation creating a router and assigning a new external gateway (external
network). This table never decreases,
* Action item: I'll open a LP bug, investigate this issue and if this is a
core OVN issue, report it.

*** Live migration with ML2/OVN ***
This is a common topic and not only for ML2/OVN. The migration time has
many factors (memory size, applications running, network BW, etc) that
could slow down the migration time and trigger a communication gap during
this process.
* Action item: to create better documentation, both in Nova and Neutron,
about the migration process, what has been done to improve it (for example,
the OVN multiple port binding) and what factors will affect the migration.

*** ML2/OVN IPv6 DVR ***
This spec was approved during the last cycle [1]. The implementation [2] is
under review.
* Action item: to review the patch (for Neutron reviewers)
* Action item: to implement the necessary tempest tests (for the feature
developers)

*** BGP with ML2/OVS, exposing address blocks ***
This user has successfully deployed Neutron with ML2/OVS and n-d-r. This
user is currently making public a certain set of FIPs. However, for other
VMs without FIPs, the goal is to make the router GW port IP address public,
using the address blocks functionality; this is not working according to
the user.
* Action item: (for this user) to create a LP bug describing the
architecture of the deployment, the configuration used and the API commands
used to reproduce this issue.

*** Metadata service (any backend) ***
Neutron is in charge of deploying the Metadata service on the compute
nodes. Each time the metadata HTTP server is called, it requests from the
Neutron API the instance and tenant ID [3]. This method implies a RPC call.
In "busy" compute nodes, where the VMs are created and destroyed very fast,
this RPC communication is a bottleneck.
* Action item: open a LP bug to implement the same ``CacheBackedPluginApi``
used in the OVS agent. This RPC cached class creates a set of subscriptions
to the needed resources ("ports" in this case). The Neutron API will send
the port updated info and cached locally; that makes unnecessary the RPC
request if the resources are stored locally.

*** ML2/OVN + Ironic nodes ***
This user has deployed ML2/OVN with Ironic nodes, and is using
ovn-bgp-agent with the eVPN driver to make public the private ports (IP and
MACs) to the Ironic node ports. More information in [4].

*** BGP acceleration in ML2/OVN ***
Many questions related to this topic, both with DPDK and HW offload. I
would refer (once the link is available) to the talk "Enabling
multi-cluster connectivity using dynamic routing via BGP in Openstack"
given by Christophe Fontaine during this PTG. You'll find it very
interesting how this new implementation moves all the packet processing to
the OVS datapath (removing any Linux Bridge / iptables processing). The
example provided in the talk refers to the use of DPDK.


I hope this PTG was interesting for you! Don't hesitate to use the usual
channels that are the mailing list and IRC. Remember we have the weekly
Neutron meeting every Tuesday at 1400UTC.

Regards.

[1]
https://specs.openstack.org/openstack/neutron-specs/specs/2023.1/ovn-ipv6-dvr.html
[2]https://review.opendev.org/c/openstack/neutron/+/867513
[3]
https://github.com/openstack/neutron/blob/cbb89fdb1414a1b3a8e8b3a9a4154ef627bb9d1a/neutron/agent/metadata/agent.py#L89
[4]https://ltomasbo.wordpress.com/2021/06/25/openstack-networking-with-evpn/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230620/56b3ec93/attachment.htm>


More information about the openstack-discuss mailing list