[openstack-dev] [Neutron] l2pop problems

Mathieu Rohon mathieu.rohon at gmail.com
Fri Jul 18 14:26:04 UTC 2014


Hi Zang,

On Wed, Jul 16, 2014 at 4:43 PM, Zang MingJie <zealot0630 at gmail.com> wrote:
> Hi, all:
>
> While resolving ovs restart rebuild br-tun flows[1], we have found
> several l2pop problems:
>
> 1. L2pop is depending on agent_boot_time to decide whether send all
> port information or not, but the agent_boot_time is unreliable, for
> example if the service receives port up message before agent status
> report, the agent won't receive any port on other agents forever.

you're right, there a race condition here, if the agent has more than
1 port on the same network and if the agent sends its
update_device_up() on every port before it sends its report_state(),
it won't receive fdb concerning these network. Is it the race you are
mentionning above?
Since the report_state is done in a dedicated greenthread, and is
launched before the greenthread that manages ovsdb_monitor, the state
of the agent should be updated before the agent gets aware of its
ports and sends get_device_details()/update_device_up(), am I wrong?
So, after a restart of an agent, the agent_uptime() should be less
than the agent_boot_time configured by default in the conf when the
agent sent its first update_device_up(), the l2pop MD will be aware of
this restart and trigger the cast of all fdb entries to the restarted
agent.

But I agree that it might relies on enventlet thread managment and on
agent_boot_time that can be misconfigured by the provider.

> 2. If the openvswitch restarted, all flows will be lost, including all
> l2pop flows, the agent is unable to fetch or recreate the l2pop flows.
>
> To resolve the problems, I'm suggesting some changes:
>
> 1. Because the agent_boot_time is unreliable, the service can't decide
> whether to send flooding entry or not. But the agent can build up the
> flooding entries from unicast entries, it has already been
> implemented[2]
>
> 2. Create a rpc from agent to service which fetch all fdb entries, the
> agent calls the rpc in `provision_local_vlan`, before setting up any
> port.[3]
>
> After these changes, the l2pop service part becomes simpler and more
> robust, mainly 2 function: first, returns all fdb entries at once when
> requested; second, broadcast fdb single entry when a port is up/down.

That's an implementation that we have been thinking about during the
l2pop implementation.
Our purpose was to minimize RPC calls. But if this implementation is
buggy due to uncontrolled thread order and/or bad usage of the
agent_boot_time parameter, it's worth investigating your proposal [3].
However, I don't get why [3] depends on [2]. couldn't we have a
network_sync() sent by the agent during provision_local_vlan() which
will reconfigure ovs when the agent and/or the ovs restart?


> [1] https://bugs.launchpad.net/neutron/+bug/1332450
> [2] https://review.openstack.org/#/c/101581/
> [3] https://review.openstack.org/#/c/107409/
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list