[openstack-dev] [Neutron] L3 agent rescheduling issue

Salvatore Orlando sorlando at nicira.com
Sun Jun 7 11:15:47 UTC 2015


On 5 June 2015 at 01:29, Itsuro ODA <oda at valinux.co.jp> wrote:

> Hi,
>
> > After trying to reproduce this, I'm suspecting that the issue is actually
> > on the server side from failing to drain the agent report state queue in
> > time.
>
> I have seen before.
> I thought the senario at that time as follows.
> * a lot of create/update resource API issued
> * "rpc_conn_pool_size" pool exhausted for sending notify and blocked
>   farther sending side of RPC.
> * "rpc_thread_pool_size" pool exhausted by waiting "rpc_conn_pool_size"
>   pool for replying RPC.
> * receiving state_report is blocked because "rpc_thread_pool_size" pool
>   exhausted.
>
>
I think this could be a good explanation couldn't it?
Kevin proved that the periodic tasks are not mutually exclusive and that
long process times for sync_routers are not an issue.
However, he correctly suspected a server-side involvement, which could
actually be a lot of requests saturating the RPC pool.

On the other hand, how could we use this theory to explain why this issue
tend to occur when the agent is restarted?
Also, Eugene, what do you mean by stating that the issue could be in
agent's "fairness"?

Salvatore



> Thanks
> Itsuro Oda
>
> On Thu, 4 Jun 2015 14:20:33 -0700
> Kevin Benton <blak111 at gmail.com> wrote:
>
> > After trying to reproduce this, I'm suspecting that the issue is actually
> > on the server side from failing to drain the agent report state queue in
> > time.
> >
> > I set the report_interval to 1 second on the agent and added a logging
> > statement and I see a report every 1 second even when sync_routers is
> > taking a really long time.
> >
> > On Thu, Jun 4, 2015 at 11:52 AM, Carl Baldwin <carl at ecbaldwin.net>
> wrote:
> >
> > > Ann,
> > >
> > > Thanks for bringing this up.  It has been on the shelf for a while now.
> > >
> > > Carl
> > >
> > > On Thu, Jun 4, 2015 at 8:54 AM, Salvatore Orlando <sorlando at nicira.com
> >
> > > wrote:
> > > > One reason for not sending the heartbeat from a separate greenthread
> > > could
> > > > be that the agent is already doing it [1].
> > > > The current proposed patch addresses the issue blindly - that is to
> say
> > > > before declaring an agent dead let's wait for some more time because
> it
> > > > could be stuck doing stuff. In that case I would probably make the
> > > > multiplier (currently 2x) configurable.
> > > >
> > > > The reason for which state report does not occur is probably that
> both it
> > > > and the resync procedure are periodic tasks. If I got it right
> they're
> > > both
> > > > executed as eventlet greenthreads but one at a time. Perhaps then
> adding
> > > an
> > > > initial delay to the full sync task might ensure the first thing an
> agent
> > > > does when it comes up is sending a heartbeat to the server?
> > > >
> > > > On the other hand, while doing the initial full resync, is the  agent
> > > able
> > > > to process updates? If not perhaps it makes sense to have it down
> until
> > > it
> > > > finishes synchronisation.
> > >
> > > Yes, it can!  The agent prioritizes updates from RPC over full resync
> > > activities.
> > >
> > > I wonder if the agent should check how long it has been since its last
> > > state report each time it finishes processing an update for a router.
> > > It normally doesn't take very long (relatively) to process an update
> > > to a single router.
> > >
> > > I still would like to know why the thread to report state is being
> > > starved.  Anyone have any insight on this?  I thought that with all
> > > the system calls, the greenthreads would yield often.  There must be
> > > something I don't understand about it.
> > >
> > > Carl
> > >
> > >
> __________________________________________________________________________
> > > OpenStack Development Mailing List (not for usage questions)
> > > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >
> >
> >
> >
> > --
> > Kevin Benton
>
> --
> Itsuro ODA <oda at valinux.co.jp>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150607/5b66e0e3/attachment.html>


More information about the OpenStack-dev mailing list