Open Stack

Thu Jun 4 23:34:38 UTC 2015

I doubt it's a server side issue.
Usually there are plenty of rpc workers to drain much higher amount of rpc
messages going from agents.
So the issue could be in 'fairness' on L3 agent side. But from my
observations it was more an issue of DHCP agent than L3 agent due to
difference in resource processing.

Thanks,
Eugene.

On Thu, Jun 4, 2015 at 4:29 PM, Itsuro ODA <oda at valinux.co.jp> wrote:

> Hi,
>
> > After trying to reproduce this, I'm suspecting that the issue is actually
> > on the server side from failing to drain the agent report state queue in
> > time.
>
> I have seen before.
> I thought the senario at that time as follows.
> * a lot of create/update resource API issued
> * "rpc_conn_pool_size" pool exhausted for sending notify and blocked
>   farther sending side of RPC.
> * "rpc_thread_pool_size" pool exhausted by waiting "rpc_conn_pool_size"
>   pool for replying RPC.
> * receiving state_report is blocked because "rpc_thread_pool_size" pool
>   exhausted.
>
> Thanks
> Itsuro Oda
>
> On Thu, 4 Jun 2015 14:20:33 -0700
> Kevin Benton <blak111 at gmail.com> wrote:
>
> > After trying to reproduce this, I'm suspecting that the issue is actually
> > on the server side from failing to drain the agent report state queue in
> > time.
> >
> > I set the report_interval to 1 second on the agent and added a logging
> > statement and I see a report every 1 second even when sync_routers is
> > taking a really long time.
> >
> > On Thu, Jun 4, 2015 at 11:52 AM, Carl Baldwin <carl at ecbaldwin.net>
> wrote:
> >
> > > Ann,
> > >
> > > Thanks for bringing this up.  It has been on the shelf for a while now.
> > >
> > > Carl
> > >
> > > On Thu, Jun 4, 2015 at 8:54 AM, Salvatore Orlando <sorlando at nicira.com
> >
> > > wrote:
> > > > One reason for not sending the heartbeat from a separate greenthread
> > > could
> > > > be that the agent is already doing it [1].
> > > > The current proposed patch addresses the issue blindly - that is to
> say
> > > > before declaring an agent dead let's wait for some more time because
> it
> > > > could be stuck doing stuff. In that case I would probably make the
> > > > multiplier (currently 2x) configurable.
> > > >
> > > > The reason for which state report does not occur is probably that
> both it
> > > > and the resync procedure are periodic tasks. If I got it right
> they're
> > > both
> > > > executed as eventlet greenthreads but one at a time. Perhaps then
> adding
> > > an
> > > > initial delay to the full sync task might ensure the first thing an
> agent
> > > > does when it comes up is sending a heartbeat to the server?
> > > >
> > > > On the other hand, while doing the initial full resync, is the  agent
> > > able
> > > > to process updates? If not perhaps it makes sense to have it down
> until
> > > it
> > > > finishes synchronisation.
> > >
> > > Yes, it can!  The agent prioritizes updates from RPC over full resync
> > > activities.
> > >
> > > I wonder if the agent should check how long it has been since its last
> > > state report each time it finishes processing an update for a router.
> > > It normally doesn't take very long (relatively) to process an update
> > > to a single router.
> > >
> > > I still would like to know why the thread to report state is being
> > > starved.  Anyone have any insight on this?  I thought that with all
> > > the system calls, the greenthreads would yield often.  There must be
> > > something I don't understand about it.
> > >
> > > Carl
> > >
> > >
> __________________________________________________________________________
> > > OpenStack Development Mailing List (not for usage questions)
> > > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >
> >
> >
> >
> > --
> > Kevin Benton
>
> --
> Itsuro ODA <oda at valinux.co.jp>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150604/7f414b1d/attachment.html>

Open Stack

[openstack-dev] [Neutron] L3 agent rescheduling issue

OpenStack

Community

Documentation

Branding & Legal