[openstack-dev] [Neutron] L3 agent rescheduling issue
carl at ecbaldwin.net
Thu Jun 4 18:52:18 UTC 2015
Thanks for bringing this up. It has been on the shelf for a while now.
On Thu, Jun 4, 2015 at 8:54 AM, Salvatore Orlando <sorlando at nicira.com> wrote:
> One reason for not sending the heartbeat from a separate greenthread could
> be that the agent is already doing it .
> The current proposed patch addresses the issue blindly - that is to say
> before declaring an agent dead let's wait for some more time because it
> could be stuck doing stuff. In that case I would probably make the
> multiplier (currently 2x) configurable.
> The reason for which state report does not occur is probably that both it
> and the resync procedure are periodic tasks. If I got it right they're both
> executed as eventlet greenthreads but one at a time. Perhaps then adding an
> initial delay to the full sync task might ensure the first thing an agent
> does when it comes up is sending a heartbeat to the server?
> On the other hand, while doing the initial full resync, is the agent able
> to process updates? If not perhaps it makes sense to have it down until it
> finishes synchronisation.
Yes, it can! The agent prioritizes updates from RPC over full resync
I wonder if the agent should check how long it has been since its last
state report each time it finishes processing an update for a router.
It normally doesn't take very long (relatively) to process an update
to a single router.
I still would like to know why the thread to report state is being
starved. Anyone have any insight on this? I thought that with all
the system calls, the greenthreads would yield often. There must be
something I don't understand about it.
More information about the OpenStack-dev