[Openstack-operators] agents stop responding

Brian Clark brian.clark at cloudapt.com
Sat Aug 24 04:43:18 UTC 2013


This may be completely unrelated to your case, however we have noticed a
similar situation with L2 agents on our network nodes... intermittent
"flapping" of agent status.  In our case, this was due to the fact that it
was taking longer for some agents to report their status back to
quantum-server than the server was expecting, therefore the server marked
them as inactive.  This was especially true for OVS agents on network nodes
hosting many routers.

Our solution (for now) was to increase agent_down_time in quantum-server's
quantum.conf, giving the agent more time to respond before considering it
inactive.  We're currently using a value of 10s, and no longer see any
flapping.

On a related note, we were also occasionally getting  "WARNING
[quantum.openstack.common.loopingcall] task run outlasted interval by xxx
sec" in various quantum agent event logs.  This was related to
report_interval, which is the frequency in which the agent reports its
status.  If an agent takes longer to poll/report its status than
report_interval, warnings are thrown.

To solve this (again, for now), we increased the report_interval to 8s on
our network node's quantum.conf.  The only thing to keep in mind is to make
sure agent's report_interval is less than server's agent_down_time.

I've seen couple bugs reported in launchpad that may be related to my
particular case (at least with respect to OVS agent)... I'm not sure of
their status.  We're going to keep our settings as-is for awhile and keep
an eye on it.

Thanks,
Brian


On Fri, Aug 23, 2013 at 9:23 AM, Samuel Winchenbach <swinchen at gmail.com>wrote:

> Hi all,
>
> Have any of you experienced agents (in my case the L3 agent) being listed
> as "xxx" even though the daemon is still running?
>
> Here is a small snippet of my l3-agent.log around the time the agent
> becomes listed as "xxx"  http://pastie.org/pastes/8262693/text
>
> You can clearly see a change at 06:10:45.
>
>
> Any idea what could be causing this or other places to look for errors?
> I have been through most of the other logs and I don't see anything out of
> place.
>
>
> Thanks,
> Sam
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>


-- 
Brian Clark
Co-Founder & Director of Technology
Cloudapt LLC
6330 East 75th Street, Suite 170
Indianapolis, IN 46250
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20130824/4bb358f5/attachment.html>


More information about the OpenStack-operators mailing list