[ptg][neutron] Ussuri PTG summary

Slawek Kaplonski skaplons at redhat.com
Tue Nov 12 13:53:11 UTC 2019


Hi Neutron team,

First if all thank to all of You for great and very productive week during the
PTG in Shanghai.
Below is summary of our discussions from whole 3 days.
If I forgot about something, please respond to the email and update missing
informations. But if You want to have follow up discussion about one of the
topics from this summary, please start a new thread to keep this one only as
high level summary of the PTG.

On boarding
===========

Slides from onboarding session can be found at [1]
If You have any follow up questions to us about onboarding, or You need help
with starting any work in Neutron team, please contact me or Miguel Lavalle by
email or on IRC. My IRC nick is slaweq and Miguel's nick is mlavalle. We are
available on #openstack-neutron channel @freenode.

Train retrospective
===================

Good things in Train cycle:
* working with this team is still good experience
* core team is stable, and we didn't lost any core reviewers during the cycle,
* networking is still one of key reasons why people use OpenStack

Not good things:
* dimished vitality in stadium projects - we had also forum session and follow
  discussion about this later during the PTG,
* gate instability - we have seen many issues which were out of our control,
  like infra problems, grenade jobs failures, other projects failures, but also
  many bugs on our side,
* we have really a lot of jobs in our check/gate queue. If each of them is
  failing 5% of times, it's hard to merge any patch as almost every time, one of
  jobs will fail. Later during the PTG we also discussed that topic and we were
  looking for some jobs which we maybe can potentially drop from our queues. See
  below for summary about that,

Action items/improvements:
* many team meetings each week. We decided to limit number of meetings by:
  ** consolidate performance subteam meeting into weekly team meeting - this topic
     will be added to the team meeting's agenda for team meetings on Monday,
  ** consolidate ovn convergence meeting into weekly team meeting - this topic
     will be added to the team meeting's agenda for team meetings on Tuesday,
  ** we need to check if QoS subteam meeting is still needed,

* Review process: list of actual review priorities would be useful for the team,
  we will add "Review-Priority" label to the Neutron reviews board and try to
  use it during the Ussuri cycle.

Openvswitch agent enhancements
==============================

We had bunch of topics related to potential improvements for
neutron-openvswitch-agent proposed mostly by Liu Yulong. Slides with his
proposals are available at [2].

* retire DHCP agent - resyncs of DHCP agent are problematic, especially when
  agent hosts many networks. Proposal was to add new L2 agent's extension which
  could be used instead of "regular" DHCP agent and to provide only basic DHCP
  functionalities.
  Such solutions would work in the way quite similar to how networking-ovn works
  today but we would need to implement and maintain own dhcp server application.

  Problems of this solution are:
** problems with compatibility e.g. with Ironic,
** how it would work with mixed deployments, e.g. with ovs and sriov agents,
** support for dhcp options,

  Advantages of this solution:
** fully distributed DHCP service,
** no DHCP agents, so less RPC messages on the bus and easier maintanance of
   the agents,

  Team's feedback for that is that this is potentially nice solution which may
  helps in some specific, large scale deploymnets. We can continue discussion
  about this during Ussuri cycle for sure.

* add accepted egress fdb flows
  We agreed that this is a bug and we should continue work on this to propose
  some way to fix it.
  Solution proposed by LIU during this discussion wasn't good as it could
  potentially break some corner cases.

* new API and agent for L2 traffic health check
  The team asked to add to the spec some more detailed and concrete use cases
  with explanation how this new API may help operator of the cloud to
  investigate where the problem actually is.

* Local flows cache and batch updating
  The team agreed that as long as this will be optional solution which operator
  can opt-in we can give it a try. But spec and discuss details there will be
  necessary.

* stop processing ports twice in ovs-agent
  We all agreed that this is a bug and should be fixed. But we have to be
  careful as fixing this bug may cause some other problems e.g. with
  live-migration - see nova-neutron cross project session.

* ovs-agent: batch flow updates with --bundle
  We all agreed that this can be done as an improvement of existing code.
  Similar option is already used in openvswitch firewall driver.

Neutron - Cyborg cross project session
======================================

Etherpad for the session is at [3].
Cyborg team wants to include Neutron in workflow of spawning VMs with Smart NICs
or accelerator cards. From Neutron's side, required change is to allow including
"accel" data in port binding profile. As long as this will be well documented
what can be placed there, there should be no problem with doing that.
Technically we can place almost anything there.

Neutron - Kuryr cross project session
=====================================

Etherpad for the session is at [4].
Kuryr team proposed 4 improvements for Neutron which would help a lot Kuryr.
Ideas are:
* Network cascade deletion,
* Force subport deletion,
* Tag resources at creation time,
* Security group creation with rules & bulk security group rule creation

All of those ideas makes sense for Neutron team. Tag resources at creation time
is even accepted rfe already - see [5] but there was no volunteer to implement
it. We will add it to list of our BPs tracked weekly on team meeting. Miguel
Lavalle is going to take a look at it during this cycle.
For other proposals we need to have RFEs reported first.

Starting the process of removing ML2/Linuxbridge
================================================

Currently in Neutron tree we have 4 drivers:
* Linuxbridge,
* Openvswitch,
* macvtap,
* sriov.
SR-IOV driver is out of discussion here as this driver is
addressing slightly different use case than other out drivers.

We started discussion about above topic because we don't want to end up with too
many drivers in-tree and we also had some discussions (and we have spec for that
already) about include networking-ovn as in-tree driver.
So with networking-ovn in-tree we would have already 4 drivers which can be used
on any hardware: linuxbridge, ovs, macvtap and ovn.
Conclusions from the discussion are:
* each driver requires proper testing in the gate, so we need to add many new
  jobs to our check/gate queue,
* currently linuxbridge driver don't have a lot of development and feature
  parity gaps between linuxbridge and ovs drivers is getting bigger and bigger
  (e.g. dvr, trunk ports),
* also macvtap driver don't have a lot of activity in last few cycles. Maybe
  this one could be also considered as candidate to deprecation,
* we need to have process of deprecating some drivers and time horizon for such
  actions should be at least 2 cycles.
* we will not remove any driver completly but rather we will move it to be in
  stadium process first so it still can be maintained by people who are
  interested in it.

Actions to do after this discussion:
* Miguel Lavalle will contact RAX and Godaddy (we know that those are
  Linuxbridge users currently) to ask about their feedback about this,
* if there are any other companies using LB driver, Nate Johnston is willing to
  help conctating them, please reach to him in such case.
* we may ratify marking linuxbridge as deprecated in the team meeting during
  Ussuri cycle if nothing surprising pops in.

Encrypted(IPSec) tenant networks
================================

Interesting topic proposed but we need to have RFE and spec with more detailed
informations about it to continue discussions.

Medatada service over IPv6
==========================

This is continuation of old RFE [6].
The only real problem is to choose proper IPv6 address which will be well known
address used e.g. by cloud-init.
Original spec proposed fe80::a9fe:a9fe as IPv6 address to access metadata
service.
We decided to be bold and define the standard.
Bence Romsics and Miguel Lavalle volunteered to reach out to cloud-init
maintainers to discuss that.

walkthrough of OVN
==================

Since some time we have in review spec about ml2/ovs and ovn convergence. See
[7] for details.
List of parity gaps between those backends is available at [8].
During the discussion we talked about things like:
* migration from ml2/ovs to ml2/ovn - some scripts are already done in [9],
* migration from ml2/lb to ml2/ovn - there was no any work done in this topic so
  far but it should be doable also if someone would need it and want to invest
  own time for that,
* include networking-ovn as in-tree neutron driver and reasons why it could be
  good idea.

  Main reasons of that are:
** that would help growing networking-ovn community,
** would help to maintain a healthy project team,
** the default drivers have always been in-tree,

  However such inclusion may also hurt modularity/logical separation/dependency
  management/packaging/etc so we need to consider it really carefully and
  consider all points of view and opinions.

Next action item on this topic is to write more detailed summary of this topic
and send it to ML and ask wider audience for feedback.

IPv6 devstack tempest test configuration vs OVN
===============================================

Generally team supports idea which was described during this session and we
should change sligtly IPv6 config on e.g. devstack deployments.

Neutron - Edge SIG session
==========================

We discussed about RFE [10]. This will require also changes on placement side.
See [11] for details.
Also some cyborg and ovn related changes may be relevant to topics related to
Edge.
Currently specs which we have are only related to ML2/OVS solution.

Neutron - Nova cross project session
====================================

Etherpad for this session is on [12]. Summary written already by gibi can be
found at [13].
On [14] You can find image which shows in visual way problem with live-migration
of instances with SR-IOV ports.

Policy handling in Neutron
==========================

The goal of the session was to plan on Neutron's side similar effort to what
services like nova are doing now to use new roles like reader and scopes, like
project, domain, system provided by Keystone.
Miguel Lavalle volunteered to work on this for Neutron and to be part of popup
team for cross project collaboration on this topic.

Neutron performance improvements
================================

Miguel Lavalle shown us his new profiling decorator [15] and how we all can use
it to profile some of API calls in Neutron.

Reevaluate Stadium projects
===========================

This was follow up discussion after forum session. Notes from forum session can
be found at [16].
Nate also prepared some good data about stadium projects activity in last
cycles. See at [17] and [18] for details.
We all agreed that projects which are in (relatively) good condition now are:
* networking-ovn,
* networking-odl,
* ovsdbapp

Projects in bad condition are other projects, like:
* neutron-interconnection,
* networking-sfc,
* networking-bagpipe/bgpvpn,
* networking-midonet,
* neutron-fwaas and neutron-fwaas-dashboard,
* neutron-dynamic-routing,
* neutron-vpnaas and neutron-vpnaas-dashboard,

We decided to immediately remove neutron-interconnection project as it was never
really implemented.
For other of those projects, we will send emails to ML to ask for potential
maintainers of those projects. If there will be no any volunteers to maintain
some of those projects, we will deprecated them and move to "x/" namespace in 2
cycles.

Floating IP's On Routed Networks
================================

There is still interest of doing this. Lajos Katona started adding some scenario
tests for routed networks already as we need improved test coverage for this
feature.
Miguel Lavalle said that he will possibly try to work on implementing this in
Ussuri cycle.

L3 agent enhancement
====================

We talked about couple potential improvements of existing L3 agent, all proposed
by LIU Yulong.

* retire metering-agent
  It seems that there is some interest in metering agent recently so we
  shouldn't probably consider of retiring it for now.
  We also talked about adding new "tc based" driver to the metering agent and
  this discussion can be continue on rfe bug [19].

* Centralized DNAT (non-DVR) traffic (floating IP) Scale-out
  This is proposal of new DVR solution. Some details of this new solution are
  available at [20].
  We agreed that this proposal is trying to solve some very specific use case,
  and it seems to be very complicated solution with many potential corner cases
  to address. As a community we don't want to introduce and maintain such
  complicated new L3 design.

* Lazy-load agent side router resources when no related service port
  Team wants to see RFE with detailed description of the exact problem which
  this is trying to solve and than continue discussion on such RFE.

Zuul jobs
=========

In this session we talked about jobs which we can potentially promote to be
voting (and we didn't found any of such) and about jobs which we maybe can
potentially remove from our queues.
Here is what we agreed:
* we have 2 iptables_hybrid jobs - one on Fedora and one on Ubuntu - we will
  drop one of those jobs and left only one of them,
* drop neutron-grenade job as it is running still on py27 - we have grenade-py3
  which is the same job but run on py36 already,
* as it is begin of the cycle, we will switch in devstack neutron uwsgi to be
  default choice and we will remove "-uwsgi" jobs from queue,
* we should compare our single node and multinode variants of same jobs and
  maybe promote multinode jobs to be voting and then remove single node job - I
  volunteered to do that,
* remove our existing experimental jobs as those jobs are mostly broken and
  nobody is run those jobs in experimental queue actually,
* Yamamoto will check failing networking-midonet job and propose patch to make
  it passing again,
* we will change neutron-tempest-plugin jobs for branch in EM phase to always
  use certain tempest-plugin and tempest tag, than we will remove those jobs
  from check and gate queue in master branch,

Stateless security groups
=========================

Old RFE [21] was approved for neutron-fwaas project but we all agreed that this
should be now implemented for security groups in core Neutron.
People from Nuage are interested in work on this in upstream.
We should probably also explore how easy/hard it will be to implement it in
networking-ovn backend.

Old, stagnant specs
===================

During this session we decided to abandon many of old specs which were proposed
long time ago and there is currently no any activity and interest in continue
working on them.
If anyone would be interested in continue work on some of them, feel free to
contact neutron core team on irc or through email and we can always reopen such
patch.

Community Goal things
=====================

We discussed about currently proposed community goals and who can take care of
which goal on Neutron's side.
Currently there are proposals of community goals as below:
* python3 readiness - Nate will take care of this,
* move jobs definitions to zuul v3 - I will take care of it. In core neutron and
  neutron-tempest-plugin we are (mostly) done. On stadium projects' side this
  will require some work to do,
* Project specific PTL and contributor guides - Miguel Lavalle will take care of
  this goal as former PTL,

We will track progress of community goals weekly in our team meetings.

Neutron-lib
===========

As some time ago our main neutron-lib maintainer (Boden) leaved from the
project, we need some new volunteers to continue work on it. Todo list is
available on [22].
This should be mostly important for people who are maintaining stadium projects
or some 3rd party drivers/plugins so if You are doing things like that, please
check list from [22] and reach out to us on ML or #openstack-neutron IRC
channel.

[1] https://www.slideshare.net/SawomirKaposki/neutron-on-boarding-room
[2] https://github.com/gotostack/shanghai_ptg/blob/master/shanghai_neutron_ptg_topics_liuyulong.pdf
[3] https://etherpad.openstack.org/p/Shanghai-Neutron-Cyborg-xproj
[4] https://etherpad.openstack.org/p/kuryr-neutron-nice-to-have
[5] https://bugs.launchpad.net/neutron/+bug/1815933
[6] https://bugs.launchpad.net/neutron/+bug/1460177
[7] https://review.opendev.org/#/c/658414/
[8] https://etherpad.openstack.org/p/ML2-OVS-OVN-Convergence
[9] https://github.com/openstack/networking-ovn/tree/master/migration
[10] https://bugs.launchpad.net/neutron/+bug/1832526
[11] http://lists.openstack.org/pipermail/openstack-discuss/2019-October/009991.html
[12] https://etherpad.openstack.org/p/ptg-ussuri-xproj-nova-neutron
[13] http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010654.html
[14] https://imgur.com/a/12PrQ9W
[15] https://review.opendev.org/678438
[16] https://etherpad.openstack.org/p/PVG-Neutron-stadium-projects-the-path-forward
[17] https://ethercalc.openstack.org/neutron-stadium-train-metrics
[18] https://ibb.co/SBzDGdD
[19] https://bugs.launchpad.net/neutron/+bug/1817881
[20] https://imgur.com/a/6MeNUNb
[21] https://bugs.launchpad.net/neutron/+bug/1753466
[22] https://etherpad.openstack.org/p/neutron-lib-volunteers-and-punch-list

-- 
Slawek Kaplonski
Senior software engineer
Red Hat




More information about the openstack-discuss mailing list