<div dir="ltr">After trying to reproduce this, I'm suspecting that the issue is actually on the server side from failing to drain the agent report state queue in time.<div><br></div><div>I set the report_interval to 1 second on the agent and added a logging statement and I see a report every 1 second even when sync_routers is taking a really long time.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jun 4, 2015 at 11:52 AM, Carl Baldwin <span dir="ltr"><<a href="mailto:carl@ecbaldwin.net" target="_blank">carl@ecbaldwin.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Ann,<br>

<br>

Thanks for bringing this up.  It has been on the shelf for a while now.<br>

<br>

Carl<br>

<span class=""><br>

On Thu, Jun 4, 2015 at 8:54 AM, Salvatore Orlando <<a href="mailto:sorlando@nicira.com">sorlando@nicira.com</a>> wrote:<br>

> One reason for not sending the heartbeat from a separate greenthread could<br>

> be that the agent is already doing it [1].<br>

> The current proposed patch addresses the issue blindly - that is to say<br>

> before declaring an agent dead let's wait for some more time because it<br>

> could be stuck doing stuff. In that case I would probably make the<br>

> multiplier (currently 2x) configurable.<br>

><br>

> The reason for which state report does not occur is probably that both it<br>

> and the resync procedure are periodic tasks. If I got it right they're both<br>

> executed as eventlet greenthreads but one at a time. Perhaps then adding an<br>

> initial delay to the full sync task might ensure the first thing an agent<br>

> does when it comes up is sending a heartbeat to the server?<br>

><br>

> On the other hand, while doing the initial full resync, is the  agent able<br>

> to process updates? If not perhaps it makes sense to have it down until it<br>

> finishes synchronisation.<br>

<br>

</span>Yes, it can!  The agent prioritizes updates from RPC over full resync<br>

activities.<br>

<br>

I wonder if the agent should check how long it has been since its last<br>

state report each time it finishes processing an update for a router.<br>

It normally doesn't take very long (relatively) to process an update<br>

to a single router.<br>

<br>

I still would like to know why the thread to report state is being<br>

starved.  Anyone have any insight on this?  I thought that with all<br>

the system calls, the greenthreads would yield often.  There must be<br>

something I don't understand about it.<br>

<span class="HOEnZb"><font color="#888888"><br>

Carl<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div>Kevin Benton</div></div>

</div>