<div dir="ltr">I doubt it's a server side issue.<div>Usually there are plenty of rpc workers to drain much higher amount of rpc messages going from agents.</div><div>So the issue could be in 'fairness' on L3 agent side. But from my observations it was more an issue of DHCP agent than L3 agent due to difference in resource processing.</div><div><br></div><div>Thanks,</div><div>Eugene.</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jun 4, 2015 at 4:29 PM, Itsuro ODA <span dir="ltr"><<a href="mailto:oda@valinux.co.jp" target="_blank">oda@valinux.co.jp</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<span class=""><br>
> After trying to reproduce this, I'm suspecting that the issue is actually<br>
> on the server side from failing to drain the agent report state queue in<br>
> time.<br>
<br>
</span>I have seen before.<br>
I thought the senario at that time as follows.<br>
* a lot of create/update resource API issued<br>
* "rpc_conn_pool_size" pool exhausted for sending notify and blocked<br>
farther sending side of RPC.<br>
* "rpc_thread_pool_size" pool exhausted by waiting "rpc_conn_pool_size"<br>
pool for replying RPC.<br>
* receiving state_report is blocked because "rpc_thread_pool_size" pool<br>
exhausted.<br>
<br>
Thanks<br>
Itsuro Oda<br>
<div class="HOEnZb"><div class="h5"><br>
On Thu, 4 Jun 2015 14:20:33 -0700<br>
Kevin Benton <<a href="mailto:blak111@gmail.com">blak111@gmail.com</a>> wrote:<br>
<br>
> After trying to reproduce this, I'm suspecting that the issue is actually<br>
> on the server side from failing to drain the agent report state queue in<br>
> time.<br>
><br>
> I set the report_interval to 1 second on the agent and added a logging<br>
> statement and I see a report every 1 second even when sync_routers is<br>
> taking a really long time.<br>
><br>
> On Thu, Jun 4, 2015 at 11:52 AM, Carl Baldwin <<a href="mailto:carl@ecbaldwin.net">carl@ecbaldwin.net</a>> wrote:<br>
><br>
> > Ann,<br>
> ><br>
> > Thanks for bringing this up. It has been on the shelf for a while now.<br>
> ><br>
> > Carl<br>
> ><br>
> > On Thu, Jun 4, 2015 at 8:54 AM, Salvatore Orlando <<a href="mailto:sorlando@nicira.com">sorlando@nicira.com</a>><br>
> > wrote:<br>
> > > One reason for not sending the heartbeat from a separate greenthread<br>
> > could<br>
> > > be that the agent is already doing it [1].<br>
> > > The current proposed patch addresses the issue blindly - that is to say<br>
> > > before declaring an agent dead let's wait for some more time because it<br>
> > > could be stuck doing stuff. In that case I would probably make the<br>
> > > multiplier (currently 2x) configurable.<br>
> > ><br>
> > > The reason for which state report does not occur is probably that both it<br>
> > > and the resync procedure are periodic tasks. If I got it right they're<br>
> > both<br>
> > > executed as eventlet greenthreads but one at a time. Perhaps then adding<br>
> > an<br>
> > > initial delay to the full sync task might ensure the first thing an agent<br>
> > > does when it comes up is sending a heartbeat to the server?<br>
> > ><br>
> > > On the other hand, while doing the initial full resync, is the agent<br>
> > able<br>
> > > to process updates? If not perhaps it makes sense to have it down until<br>
> > it<br>
> > > finishes synchronisation.<br>
> ><br>
> > Yes, it can! The agent prioritizes updates from RPC over full resync<br>
> > activities.<br>
> ><br>
> > I wonder if the agent should check how long it has been since its last<br>
> > state report each time it finishes processing an update for a router.<br>
> > It normally doesn't take very long (relatively) to process an update<br>
> > to a single router.<br>
> ><br>
> > I still would like to know why the thread to report state is being<br>
> > starved. Anyone have any insight on this? I thought that with all<br>
> > the system calls, the greenthreads would yield often. There must be<br>
> > something I don't understand about it.<br>
> ><br>
> > Carl<br>
> ><br>
> > __________________________________________________________________________<br>
> > OpenStack Development Mailing List (not for usage questions)<br>
> > Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
> > <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
> ><br>
><br>
><br>
><br>
> --<br>
> Kevin Benton<br>
<br>
</div></div><span class="HOEnZb"><font color="#888888">--<br>
Itsuro ODA <<a href="mailto:oda@valinux.co.jp">oda@valinux.co.jp</a>><br>
</font></span><div class="HOEnZb"><div class="h5"><br>
<br>
__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</div></div></blockquote></div><br></div>