<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif;">
<div>
<div>
<div>We are seeing issues only on client side as of now. </div>
<div>But we do have  </div>
<div>
<div>net.ipv4.tcp_retries2 = 3 set</div>
</div>
<div><br>
</div>
<div>Ajay</div>
<div>
<div id="MAC_OUTLOOK_SIGNATURE"></div>
</div>
</div>
</div>
<div><br>
</div>
<span id="OLK_SRC_BODY_SECTION">
<div style="font-family:Calibri; font-size:12pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt">
<span style="font-weight:bold">From: </span>"Edmund Rhudy (BLOOMBERG/ 731 LEX)" <<a href="mailto:erhudy@bloomberg.net">erhudy@bloomberg.net</a>><br>
<span style="font-weight:bold">Reply-To: </span>"Edmund Rhudy (BLOOMBERG/ 731 LEX)" <<a href="mailto:erhudy@bloomberg.net">erhudy@bloomberg.net</a>><br>
<span style="font-weight:bold">Date: </span>Thursday, April 21, 2016 at 12:11 PM<br>
<span style="font-weight:bold">To: </span>Ajay Kalambur <<a href="mailto:akalambu@cisco.com">akalambu@cisco.com</a>><br>
<span style="font-weight:bold">Cc: </span>"<a href="mailto:openstack-operators@lists.openstack.org">openstack-operators@lists.openstack.org</a>" <<a href="mailto:openstack-operators@lists.openstack.org">openstack-operators@lists.openstack.org</a>><br>
<span style="font-weight:bold">Subject: </span>Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo<br>
</div>
<div><br>
</div>
<div>
<title></title>
<div><!-- rte-version 0.2 9947551637294008b77bce25eb683dac -->
<div class="rte-style-maintainer" style="white-space: pre-wrap; font-size: small; font-family: 'Courier New', Courier; color: rgb(0, 0, 0);" bbg-color="default" data-bb-font-size="medium" bbg-font-size="medium" bbg-font-family="fixed-width">
Are you seeing issues only on the client side, or anything on the broker side? We were having issues with nodes not successfully reconnecting and ended up making a number of changes on the broker side to improve resiliency (upgrading to RabbitMQ 3.5.5 or higher,
 reducing net.ipv4.tcp_retries2 to evict failed connections faster, configuring heartbeats in RabbitMQ to detect failed clients more quickly).<br>
<div class="rte-style-maintainer" style="font-size: small; font-family: 'Courier New', Courier; color: rgb(0, 0, 0);" data-color="global-default" bbg-color="default" data-bb-font-size="medium" bbg-font-size="medium" bbg-font-family="fixed-width">
<br>
<div class="bbg-rte-fold-content" data-header="From: akalambu@cisco.com" data-digest="From: akalambu@cisco.com" style="">
<div class="bbg-rte-fold-summary">From: <a href="mailto:akalambu@cisco.com">akalambu@cisco.com</a>
</div>
<div>Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo<br>
</div>
</div>
<div class="rte-internet-block-wrapper" style="color: black; font-family: Arial, 'BB.Proportional'; font-size: small; white-space: normal; background: white;">
<div class="rte-internet-block">
<blockquote>
<div>
<div>
<div>Do you recommend both or can I do away with the system timers and just keep the heartbeat?</div>
<div>Ajay</div>
<div><br>
</div>
<div>
<div id=""></div>
</div>
</div>
</div>
<div><br>
</div>
<span id="OLK_SRC_BODY_SECTION"></span>
<div style="font-family:Calibri; font-size:12pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt">
<span id="OLK_SRC_BODY_SECTION"><span style="font-weight:bold">From: </span>"Kris G. Lindgren" <<a spellcheck="false" bbg-destination="mailto:rte:bind" class="" href="mailto:klindgren@godaddy.com" data-destination="mailto:rte:bind">klindgren@godaddy.com</a>><br>
<span style="font-weight:bold">Date: </span>Thursday, April 21, 2016 at 11:54 AM<br>
<span style="font-weight:bold">To: </span>Ajay Kalambur <<a spellcheck="false" bbg-destination="mailto:rte:bind" class="rte-from-internet" href="mailto:akalambu@cisco.com" data-destination="mailto:rte:bind">akalambu@cisco.com</a>>, "<a spellcheck="false" bbg-destination="mailto:rte:bind" class="" href="mailto:openstack-operators@lists.openstack.org" data-destination="mailto:rte:bind">openstack-operators@lists.openstack.org</a>"
 <<a spellcheck="false" bbg-destination="mailto:rte:bind" class="rte-from-internet" href="mailto:openstack-operators@lists.openstack.org" data-destination="mailto:rte:bind">openstack-operators@lists.openstack.org</a>><br>
<span style="font-weight:bold">Subject: </span>Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo<br>
</span></div>
<div><br>
</div>
<div>
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif;">
<div>
<div>Yea, that only fixes part of the issue.  The other part is getting the openstack messaging code itself to figure out the connection its using is no longer valid.  Heartbeats by itself solved 90%+ of our issues with rabbitmq and nodes being disconnected
 and never reconnecting.</div>
<div>
<div id="">
<div><font class="Apple-style-span" color="#000000"><font class="Apple-style-span" face="Calibri"><br>
</font></font></div>
<div><font class="Apple-style-span" color="#000000"><font class="Apple-style-span" face="Calibri">___________________________________________________________________</font></font></div>
<div><font class="Apple-style-span" color="#000000"><font class="Apple-style-span" face="Calibri">Kris Lindgren</font></font></div>
<div><font class="Apple-style-span" color="#000000"><font class="Apple-style-span" face="Calibri"><span class="Apple-style-span" style="font-size: 14px;">Senior Linux Systems Engineer</span></font></font></div>
<div><font class="Apple-style-span" color="#000000"><font class="Apple-style-span" face="Calibri"><span class="Apple-style-span" style="font-size: 14px;">GoDaddy</span></font></font></div>
</div>
</div>
</div>
<div><br>
</div>
<span id="OLK_SRC_BODY_SECTION"></span>
<div style="font-family:Calibri; font-size:12pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt">
<span id="OLK_SRC_BODY_SECTION"><span style="font-weight:bold">From: </span>"Ajay Kalambur (akalambu)" <<a spellcheck="false" bbg-destination="mailto:rte:bind" class="" href="mailto:akalambu@cisco.com" data-destination="mailto:rte:bind">akalambu@cisco.com</a>><br>
<span style="font-weight:bold">Date: </span>Thursday, April 21, 2016 at 12:51 PM<br>
<span style="font-weight:bold">To: </span>"Kris G. Lindgren" <<a spellcheck="false" bbg-destination="mailto:rte:bind" class="rte-from-internet" href="mailto:klindgren@godaddy.com" data-destination="mailto:rte:bind">klindgren@godaddy.com</a>>, "<a spellcheck="false" bbg-destination="mailto:rte:bind" class="" href="mailto:openstack-operators@lists.openstack.org" data-destination="mailto:rte:bind">openstack-operators@lists.openstack.org</a>"
 <<a spellcheck="false" bbg-destination="mailto:rte:bind" class="rte-from-internet" href="mailto:openstack-operators@lists.openstack.org" data-destination="mailto:rte:bind">openstack-operators@lists.openstack.org</a>><br>
<span style="font-weight:bold">Subject: </span>Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo<br>
</span></div>
<div><br>
</div>
<div>
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif;">
<div>
<div>
<div>Trying that now. I had aggressive system keepalive timers before</div>
<div><br>
</div>
<div>
<div>net.ipv4.tcp_keepalive_intvl = 10</div>
<div>net.ipv4.tcp_keepalive_probes = 9</div>
<div>net.ipv4.tcp_keepalive_time = 5</div>
</div>
<div><br>
</div>
<div>
<div id=""></div>
</div>
</div>
</div>
<div><br>
</div>
<span id="OLK_SRC_BODY_SECTION"></span>
<div style="font-family:Calibri; font-size:12pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt">
<span id="OLK_SRC_BODY_SECTION"><span style="font-weight:bold">From: </span>"Kris G. Lindgren" <<a spellcheck="false" bbg-destination="mailto:rte:bind" class="" href="mailto:klindgren@godaddy.com" data-destination="mailto:rte:bind">klindgren@godaddy.com</a>><br>
<span style="font-weight:bold">Date: </span>Thursday, April 21, 2016 at 11:50 AM<br>
<span style="font-weight:bold">To: </span>Ajay Kalambur <<a spellcheck="false" bbg-destination="mailto:rte:bind" class="rte-from-internet" href="mailto:akalambu@cisco.com" data-destination="mailto:rte:bind">akalambu@cisco.com</a>>, "<a spellcheck="false" bbg-destination="mailto:rte:bind" class="" href="mailto:openstack-operators@lists.openstack.org" data-destination="mailto:rte:bind">openstack-operators@lists.openstack.org</a>"
 <<a spellcheck="false" bbg-destination="mailto:rte:bind" class="rte-from-internet" href="mailto:openstack-operators@lists.openstack.org" data-destination="mailto:rte:bind">openstack-operators@lists.openstack.org</a>><br>
<span style="font-weight:bold">Subject: </span>Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo<br>
</span></div>
<div><br>
</div>
<div>
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif;">
<div>
<div>Do you have rabbitmq/oslo messaging heartbeats enabled?</div>
<div><br>
</div>
<div>If you aren't using heartbeats it will take a long time  for the nova-compute agent to figure out that its actually no longer attached to anything.  Heartbeat does periodic checks against rabbitmq and will catch this state and reconnect. </div>
<div>
<div id="">
<div><font class="Apple-style-span" color="#000000"><font class="Apple-style-span" face="Calibri"><br>
</font></font></div>
<div><font class="Apple-style-span" color="#000000"><font class="Apple-style-span" face="Calibri">___________________________________________________________________</font></font></div>
<div><font class="Apple-style-span" color="#000000"><font class="Apple-style-span" face="Calibri">Kris Lindgren</font></font></div>
<div><font class="Apple-style-span" color="#000000"><font class="Apple-style-span" face="Calibri"><span class="Apple-style-span" style="font-size: 14px;">Senior Linux Systems Engineer</span></font></font></div>
<div><font class="Apple-style-span" color="#000000"><font class="Apple-style-span" face="Calibri"><span class="Apple-style-span" style="font-size: 14px;">GoDaddy</span></font></font></div>
</div>
</div>
</div>
<div><br>
</div>
<span id="OLK_SRC_BODY_SECTION"></span>
<div style="font-family:Calibri; font-size:12pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt">
<span id="OLK_SRC_BODY_SECTION"><span style="font-weight:bold">From: </span>"Ajay Kalambur (akalambu)" <<a spellcheck="false" bbg-destination="mailto:rte:bind" class="" href="mailto:akalambu@cisco.com" data-destination="mailto:rte:bind">akalambu@cisco.com</a>><br>
<span style="font-weight:bold">Date: </span>Thursday, April 21, 2016 at 11:43 AM<br>
<span style="font-weight:bold">To: </span>"<a spellcheck="false" bbg-destination="mailto:rte:bind" class="rte-from-internet" href="mailto:openstack-operators@lists.openstack.org" data-destination="mailto:rte:bind">openstack-operators@lists.openstack.org</a>"
 <<a spellcheck="false" bbg-destination="mailto:rte:bind" class="" href="mailto:openstack-operators@lists.openstack.org" data-destination="mailto:rte:bind">openstack-operators@lists.openstack.org</a>><br>
<span style="font-weight:bold">Subject: </span>[Openstack-operators] [oslo]nova compute reconnection Issue Kilo<br>
</span></div>
<div><br>
</div>
<div>
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif;">
<div>
<div>
<div><br>
</div>
<div>
<div id=""></div>
</div>
</div>
</div>
<div>Hi</div>
<div>I am seeing on Kilo if I bring down one contoller node sometimes some computes report down forever.</div>
<div>I need to restart the compute service on compute node to recover. Looks like oslo is not reconnecting in nova-compute</div>
<div>Here is the Trace from nova-compute</div>
<div>
<div>2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 156, in call</div>
<div>2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db     retry=self.retry)</div>
<div>2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in _send</div>
<div>2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db     timeout=timeout, retry=retry)</div>
<div>2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 350, in send</div>
<div>2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db     retry=retry)</div>
<div>2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 339, in _send</div>
<div>2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db     result = self._waiter.wait(msg_id, timeout)</div>
<div>2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 243, in wait</div>
<div>2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db     message = self.waiters.get(msg_id, timeout=timeout)</div>
<div>2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 149, in get</div>
<div>2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db     'to message ID %s' % msg_id)</div>
<div>2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db MessagingTimeout: Timed out waiting for a reply to message ID e064b5f6c8244818afdc5e91fff8ebf1</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div>Any thougths. I am at stable/kilo for oslo</div>
<div><br>
</div>
<div>Ajay</div>
<div><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div style="width: 500px; font-style:oblique; margin: 14px; margin-left: 0px; padding-top: 4px; border-top: 1px dotted black">
</div>
<pre>_______________________________________________
OpenStack-operators mailing list
<a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.openstack.org</a>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a>
</pre>
</blockquote>
<br>
</div>
</div>
</div>
</div>
</div>
</div>
</span>
</body>
</html>