[openstack-dev] [Neutron] l2pop problems

Zang MingJie zealot0630 at gmail.com
Tue Aug 5 11:18:00 UTC 2014

Hi Mathieu:

We have deployed the new l2pop described in the previous mail in our
environment, and works pretty well. It solved the timing problem, and
also reduces lots of l2pop rpc calls. I'm going to file a blueprint to
propose the changes.

On Fri, Jul 18, 2014 at 10:26 PM, Mathieu Rohon <mathieu.rohon at gmail.com> wrote:
> Hi Zang,
> On Wed, Jul 16, 2014 at 4:43 PM, Zang MingJie <zealot0630 at gmail.com> wrote:
>> Hi, all:
>> While resolving ovs restart rebuild br-tun flows[1], we have found
>> several l2pop problems:
>> 1. L2pop is depending on agent_boot_time to decide whether send all
>> port information or not, but the agent_boot_time is unreliable, for
>> example if the service receives port up message before agent status
>> report, the agent won't receive any port on other agents forever.
> you're right, there a race condition here, if the agent has more than
> 1 port on the same network and if the agent sends its
> update_device_up() on every port before it sends its report_state(),
> it won't receive fdb concerning these network. Is it the race you are
> mentionning above?
> Since the report_state is done in a dedicated greenthread, and is
> launched before the greenthread that manages ovsdb_monitor, the state
> of the agent should be updated before the agent gets aware of its
> ports and sends get_device_details()/update_device_up(), am I wrong?
> So, after a restart of an agent, the agent_uptime() should be less
> than the agent_boot_time configured by default in the conf when the
> agent sent its first update_device_up(), the l2pop MD will be aware of
> this restart and trigger the cast of all fdb entries to the restarted
> agent.
> But I agree that it might relies on enventlet thread managment and on
> agent_boot_time that can be misconfigured by the provider.
>> 2. If the openvswitch restarted, all flows will be lost, including all
>> l2pop flows, the agent is unable to fetch or recreate the l2pop flows.
>> To resolve the problems, I'm suggesting some changes:
>> 1. Because the agent_boot_time is unreliable, the service can't decide
>> whether to send flooding entry or not. But the agent can build up the
>> flooding entries from unicast entries, it has already been
>> implemented[2]
>> 2. Create a rpc from agent to service which fetch all fdb entries, the
>> agent calls the rpc in `provision_local_vlan`, before setting up any
>> port.[3]
>> After these changes, the l2pop service part becomes simpler and more
>> robust, mainly 2 function: first, returns all fdb entries at once when
>> requested; second, broadcast fdb single entry when a port is up/down.
> That's an implementation that we have been thinking about during the
> l2pop implementation.
> Our purpose was to minimize RPC calls. But if this implementation is
> buggy due to uncontrolled thread order and/or bad usage of the
> agent_boot_time parameter, it's worth investigating your proposal [3].
> However, I don't get why [3] depends on [2]. couldn't we have a
> network_sync() sent by the agent during provision_local_vlan() which
> will reconfigure ovs when the agent and/or the ovs restart?

actual, [3] doesn't strictly depend [2], we have encountered l2pop
problems several times where the unicast is correct, but the broadcast
fails, so we decide completely ignore the broadcast entries in rpc,
only deal unicast entries, and use unicast entries to build broadcast

>> [1] https://bugs.launchpad.net/neutron/+bug/1332450
>> [2] https://review.openstack.org/#/c/101581/
>> [3] https://review.openstack.org/#/c/107409/
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

More information about the OpenStack-dev mailing list