[openstack-dev] [Neutron] L3 agent rescheduling issue

Carl Baldwin carl at ecbaldwin.net
Thu Jun 4 18:52:18 UTC 2015


Ann,

Thanks for bringing this up.  It has been on the shelf for a while now.

Carl

On Thu, Jun 4, 2015 at 8:54 AM, Salvatore Orlando <sorlando at nicira.com> wrote:
> One reason for not sending the heartbeat from a separate greenthread could
> be that the agent is already doing it [1].
> The current proposed patch addresses the issue blindly - that is to say
> before declaring an agent dead let's wait for some more time because it
> could be stuck doing stuff. In that case I would probably make the
> multiplier (currently 2x) configurable.
>
> The reason for which state report does not occur is probably that both it
> and the resync procedure are periodic tasks. If I got it right they're both
> executed as eventlet greenthreads but one at a time. Perhaps then adding an
> initial delay to the full sync task might ensure the first thing an agent
> does when it comes up is sending a heartbeat to the server?
>
> On the other hand, while doing the initial full resync, is the  agent able
> to process updates? If not perhaps it makes sense to have it down until it
> finishes synchronisation.

Yes, it can!  The agent prioritizes updates from RPC over full resync
activities.

I wonder if the agent should check how long it has been since its last
state report each time it finishes processing an update for a router.
It normally doesn't take very long (relatively) to process an update
to a single router.

I still would like to know why the thread to report state is being
starved.  Anyone have any insight on this?  I thought that with all
the system calls, the greenthreads would yield often.  There must be
something I don't understand about it.

Carl



More information about the OpenStack-dev mailing list