[openstack-dev] [Neutron] L3 agent rescheduling issue

Itsuro ODA oda at valinux.co.jp
Thu Jun 4 23:29:16 UTC 2015


Hi,

> After trying to reproduce this, I'm suspecting that the issue is actually
> on the server side from failing to drain the agent report state queue in
> time.

I have seen before.
I thought the senario at that time as follows.
* a lot of create/update resource API issued 
* "rpc_conn_pool_size" pool exhausted for sending notify and blocked
  farther sending side of RPC.
* "rpc_thread_pool_size" pool exhausted by waiting "rpc_conn_pool_size"
  pool for replying RPC.
* receiving state_report is blocked because "rpc_thread_pool_size" pool
  exhausted.

Thanks
Itsuro Oda

On Thu, 4 Jun 2015 14:20:33 -0700
Kevin Benton <blak111 at gmail.com> wrote:

> After trying to reproduce this, I'm suspecting that the issue is actually
> on the server side from failing to drain the agent report state queue in
> time.
> 
> I set the report_interval to 1 second on the agent and added a logging
> statement and I see a report every 1 second even when sync_routers is
> taking a really long time.
> 
> On Thu, Jun 4, 2015 at 11:52 AM, Carl Baldwin <carl at ecbaldwin.net> wrote:
> 
> > Ann,
> >
> > Thanks for bringing this up.  It has been on the shelf for a while now.
> >
> > Carl
> >
> > On Thu, Jun 4, 2015 at 8:54 AM, Salvatore Orlando <sorlando at nicira.com>
> > wrote:
> > > One reason for not sending the heartbeat from a separate greenthread
> > could
> > > be that the agent is already doing it [1].
> > > The current proposed patch addresses the issue blindly - that is to say
> > > before declaring an agent dead let's wait for some more time because it
> > > could be stuck doing stuff. In that case I would probably make the
> > > multiplier (currently 2x) configurable.
> > >
> > > The reason for which state report does not occur is probably that both it
> > > and the resync procedure are periodic tasks. If I got it right they're
> > both
> > > executed as eventlet greenthreads but one at a time. Perhaps then adding
> > an
> > > initial delay to the full sync task might ensure the first thing an agent
> > > does when it comes up is sending a heartbeat to the server?
> > >
> > > On the other hand, while doing the initial full resync, is the  agent
> > able
> > > to process updates? If not perhaps it makes sense to have it down until
> > it
> > > finishes synchronisation.
> >
> > Yes, it can!  The agent prioritizes updates from RPC over full resync
> > activities.
> >
> > I wonder if the agent should check how long it has been since its last
> > state report each time it finishes processing an update for a router.
> > It normally doesn't take very long (relatively) to process an update
> > to a single router.
> >
> > I still would like to know why the thread to report state is being
> > starved.  Anyone have any insight on this?  I thought that with all
> > the system calls, the greenthreads would yield often.  There must be
> > something I don't understand about it.
> >
> > Carl
> >
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> 
> 
> 
> -- 
> Kevin Benton

-- 
Itsuro ODA <oda at valinux.co.jp>




More information about the OpenStack-dev mailing list