<div dir="ltr">Well a greenthread will only yield when it makes a blocking call like writing to a network socket, file, etc. So once the report_state greenthread starts executing, it won't yield until it makes a call like that. <div><br></div><div>I looked through the report_state code for the DHCP agent and the only blocking call it seems to make is the AMQP report_state call/cast itself. So even with a bunch of other workers, the report_state thread should get execution fairly quickly since most of our workers should yield very frequently when they make process calls, etc. That's why I assumed that there must be something actually stopping it from sending the message. </div><div><br></div><div>Do you have a way to reproduce the issue with the DHCP agent?</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Jun 7, 2015 at 9:21 PM, Eugene Nikanorov <span dir="ltr"><<a href="mailto:enikanorov@mirantis.com" target="_blank">enikanorov@mirantis.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">No, I think greenthread itself don't do anything special, it's just when there are too many threads, state_report thread can't get the control for too long, since there is no prioritization of greenthreads.<span class="HOEnZb"><font color="#888888"><div><br></div><div>Eugene.</div></font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Jun 7, 2015 at 8:24 PM, Kevin Benton <span dir="ltr"><<a href="mailto:blak111@gmail.com" target="_blank">blak111@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I understand now. So the issue is that the report_state greenthread is just blocking and yielding whenever it tries to actually send a message?</div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Jun 7, 2015 at 8:10 PM, Eugene Nikanorov <span dir="ltr"><<a href="mailto:enikanorov@mirantis.com" target="_blank">enikanorov@mirantis.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Salvatore,<div><br></div><div>By 'fairness' I meant chances for state report greenthread to get the control. In DHCP case, each network processed by a separate greenthread, so the more greenthreads agent has, the less chances that report state greenthread will be able to report in time. </div><div><br></div><div>Thanks,</div><div>Eugene.</div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Jun 7, 2015 at 4:15 AM, Salvatore Orlando <span dir="ltr"><<a href="mailto:sorlando@nicira.com" target="_blank">sorlando@nicira.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span>On 5 June 2015 at 01:29, Itsuro ODA <span dir="ltr"><<a href="mailto:oda@valinux.co.jp" target="_blank">oda@valinux.co.jp</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<span><br>
> After trying to reproduce this, I'm suspecting that the issue is actually<br>
> on the server side from failing to drain the agent report state queue in<br>
> time.<br>
<br>
</span>I have seen before.<br>
I thought the senario at that time as follows.<br>
* a lot of create/update resource API issued<br>
* "rpc_conn_pool_size" pool exhausted for sending notify and blocked<br>
farther sending side of RPC.<br>
* "rpc_thread_pool_size" pool exhausted by waiting "rpc_conn_pool_size"<br>
pool for replying RPC.<br>
* receiving state_report is blocked because "rpc_thread_pool_size" pool<br>
exhausted.<br>
<br></blockquote><div><br></div></span><div>I think this could be a good explanation couldn't it?</div><div>Kevin proved that the periodic tasks are not mutually exclusive and that long process times for sync_routers are not an issue.</div><div>However, he correctly suspected a server-side involvement, which could actually be a lot of requests saturating the RPC pool.</div><div><br></div><div>On the other hand, how could we use this theory to explain why this issue tend to occur when the agent is restarted?</div><div>Also, Eugene, what do you mean by stating that the issue could be in agent's "fairness"?</div><span><font color="#888888"><div><br></div><div>Salvatore</div></font></span><div><div><div><br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Thanks<br>
Itsuro Oda<br>
<div><div><br>
On Thu, 4 Jun 2015 14:20:33 -0700<br>
Kevin Benton <<a href="mailto:blak111@gmail.com" target="_blank">blak111@gmail.com</a>> wrote:<br>
<br>
> After trying to reproduce this, I'm suspecting that the issue is actually<br>
> on the server side from failing to drain the agent report state queue in<br>
> time.<br>
><br>
> I set the report_interval to 1 second on the agent and added a logging<br>
> statement and I see a report every 1 second even when sync_routers is<br>
> taking a really long time.<br>
><br>
> On Thu, Jun 4, 2015 at 11:52 AM, Carl Baldwin <<a href="mailto:carl@ecbaldwin.net" target="_blank">carl@ecbaldwin.net</a>> wrote:<br>
><br>
> > Ann,<br>
> ><br>
> > Thanks for bringing this up. It has been on the shelf for a while now.<br>
> ><br>
> > Carl<br>
> ><br>
> > On Thu, Jun 4, 2015 at 8:54 AM, Salvatore Orlando <<a href="mailto:sorlando@nicira.com" target="_blank">sorlando@nicira.com</a>><br>
> > wrote:<br>
> > > One reason for not sending the heartbeat from a separate greenthread<br>
> > could<br>
> > > be that the agent is already doing it [1].<br>
> > > The current proposed patch addresses the issue blindly - that is to say<br>
> > > before declaring an agent dead let's wait for some more time because it<br>
> > > could be stuck doing stuff. In that case I would probably make the<br>
> > > multiplier (currently 2x) configurable.<br>
> > ><br>
> > > The reason for which state report does not occur is probably that both it<br>
> > > and the resync procedure are periodic tasks. If I got it right they're<br>
> > both<br>
> > > executed as eventlet greenthreads but one at a time. Perhaps then adding<br>
> > an<br>
> > > initial delay to the full sync task might ensure the first thing an agent<br>
> > > does when it comes up is sending a heartbeat to the server?<br>
> > ><br>
> > > On the other hand, while doing the initial full resync, is the agent<br>
> > able<br>
> > > to process updates? If not perhaps it makes sense to have it down until<br>
> > it<br>
> > > finishes synchronisation.<br>
> ><br>
> > Yes, it can! The agent prioritizes updates from RPC over full resync<br>
> > activities.<br>
> ><br>
> > I wonder if the agent should check how long it has been since its last<br>
> > state report each time it finishes processing an update for a router.<br>
> > It normally doesn't take very long (relatively) to process an update<br>
> > to a single router.<br>
> ><br>
> > I still would like to know why the thread to report state is being<br>
> > starved. Anyone have any insight on this? I thought that with all<br>
> > the system calls, the greenthreads would yield often. There must be<br>
> > something I don't understand about it.<br>
> ><br>
> > Carl<br>
> ><br>
> > __________________________________________________________________________<br>
> > OpenStack Development Mailing List (not for usage questions)<br>
> > Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
> > <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
> ><br>
><br>
><br>
><br>
> --<br>
> Kevin Benton<br>
<br>
</div></div><span><font color="#888888">--<br>
Itsuro ODA <<a href="mailto:oda@valinux.co.jp" target="_blank">oda@valinux.co.jp</a>><br>
</font></span><div><div><br>
<br>
__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</div></div></blockquote></div></div></div><br></div></div>
<br>__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br></blockquote></div><br></div>
</div></div><br>__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div><div>Kevin Benton</div></div>
</div>
</div></div><br>__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br></blockquote></div><br></div>
</div></div><br>__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div>Kevin Benton</div></div>
</div>