<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

</head>

<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif;">

<div>

<div>We tried some of these (well I did last night), but the issue was that eventually rabbitmq actually died.  I was trying some of the eval commands to try to get what was in the mgmt_db, bet any get-status call eventually lead to a timeout error.  Part of

 the problem is that we can go from a warning to a zomg out of memory in under 2 minutes.  Last night it was taking only 2 hours to chew thew 40GB of ram.  Messaging rates were in the 150-300/s which is not all that high (another cell is doing a constant 1k-2k).</div>

<div>

<div id="MAC_OUTLOOK_SIGNATURE">

<div><font class="Apple-style-span" color="#000000"><font class="Apple-style-span" face="Calibri"><br>

</font></font></div>

<div><font class="Apple-style-span" color="#000000"><font class="Apple-style-span" face="Calibri"><span class="Apple-style-span" style="font-size: 14px;">___________________________________________________________________</span></font></font></div>

<div><font class="Apple-style-span" color="#000000"><font class="Apple-style-span" face="Calibri"><span class="Apple-style-span" style="font-size: 14px;">Kris Lindgren</span></font></font></div>

<div><font class="Apple-style-span" color="#000000"><font class="Apple-style-span" face="Calibri"><span class="Apple-style-span" style="font-size: 14px;">Senior Linux Systems Engineer</span></font></font></div>

<div><font class="Apple-style-span" color="#000000"><font class="Apple-style-span" face="Calibri"><span class="Apple-style-span" style="font-size: 14px;">GoDaddy</span></font></font></div>

</div>

</div>

</div>

<div><br>

</div>

<span id="OLK_SRC_BODY_SECTION">

<div style="font-family:Calibri; font-size:12pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt">

<span style="font-weight:bold">From: </span>Matt Fischer <<a href="mailto:matt@mattfischer.com">matt@mattfischer.com</a>><br>

<span style="font-weight:bold">Date: </span>Tuesday, July 5, 2016 at 11:25 AM<br>

<span style="font-weight:bold">To: </span>Joshua Harlow <<a href="mailto:harlowja@fastmail.com">harlowja@fastmail.com</a>><br>

<span style="font-weight:bold">Cc: </span>"<a href="mailto:openstack-dev@lists.openstack.org">openstack-dev@lists.openstack.org</a>" <<a href="mailto:openstack-dev@lists.openstack.org">openstack-dev@lists.openstack.org</a>>, OpenStack Operators <<a href="mailto:openstack-operators@lists.openstack.org">openstack-operators@lists.openstack.org</a>><br>

<span style="font-weight:bold">Subject: </span>Re: [Openstack-operators] [nova] Rabbit-mq 3.4 crashing (anyone else seen this?)<br>

</div>

<div><br>

</div>

<div>

<div>

<p dir="ltr">Yes! This happens often but I'd not call it a crash, just the mgmt db gets behind then eats all the memory. We've started monitoring it and have runbooks on how to bounce just the mgmt db. Here are my notes on that:</p>

<p dir="ltr">restart rabbitmq mgmt server - this seems to clear the memory usage.

</p>

<p dir="ltr">rabbitmqctl eval 'application:stop(rabbitmq_management).'<br>

rabbitmqctl eval 'application:start(rabbitmq_management).'</p>

<p dir="ltr">run GC on rabbit_mgmt_db:<br>

rabbitmqctl eval '(erlang:garbage_collect(global:whereis_name(rabbit_mgmt_db)))'</p>

<p dir="ltr">status of rabbit_mgmt_db:<br>

rabbitmqctl eval 'sys:get_status(global:whereis_name(rabbit_mgmt_db)).'</p>

<p dir="ltr">Rabbitmq mgmt DB how much memory is used:<br>

/usr/sbin/rabbitmqctl status | grep mgmt_db</p>

<p dir="ltr">Unfortunately I didn't see that an upgrade would fix for sure and any settings changes to reduce the number of monitored events also require a restart of the cluster. The other issue with an upgrade for us is the ancient version of erlang shipped

 with trusty. When we upgrade to Xenial we'll upgrade erlang and rabbit and hope it goes away. I'll also probably tweak the settings on retention of events then too.

</p>

<p dir="ltr">Also for the record the GC doesn't seem to help at all. <br>

</p>

<div class="gmail_quote">On Jul 5, 2016 11:05 AM, "Joshua Harlow" <<a href="mailto:harlowja@fastmail.com">harlowja@fastmail.com</a>> wrote:<br type="attribution">

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi ops and dev-folks,<br>

<br>

We over at godaddy (running rabbitmq with openstack) have been hitting a issue that has been causing the `rabbit_mgmt_db` consuming nearly all the processes memory (after a given amount of time),<br>

<br>

We've been thinking that this bug (or bugs?) may have existed for a while and our dual-version-path (where we upgrade the control plane and then slowly/eventually upgrade the compute nodes to the same version) has somehow triggered this memory leaking bug/issue

 since it has happened most prominently on our cloud which was running nova-compute at kilo and the other services at liberty (thus using the versioned objects code path more frequently due to needing translations of objects).<br>

<br>

The rabbit we are running is 3.4.0 on CentOS Linux release 7.2.1511 with kernel 3.10.0-327.4.4.el7.x86_64 (do note that upgrading to 3.6.2 seems to make the issue go away),<br>

<br>

# rpm -qa | grep rabbit<br>

<br>

rabbitmq-server-3.4.0-1.noarch<br>

<br>

The logs that seem relevant:<br>

<br>

```<br>

**********************************************************<br>

*** Publishers will be blocked until this alarm clears ***<br>

**********************************************************<br>

<br>

=INFO REPORT==== 1-Jul-2016::16:37:46 ===<br>

accepting AMQP connection <0.23638.342> (<a href="http://127.0.0.1:51932" rel="noreferrer" target="_blank">127.0.0.1:51932</a> ->

<a href="http://127.0.0.1:5671" rel="noreferrer" target="_blank">127.0.0.1:5671</a>)<br>

<br>

=INFO REPORT==== 1-Jul-2016::16:37:47 ===<br>

vm_memory_high_watermark clear. Memory used:29910180640 allowed:47126781542<br>

```<br>

<br>

This happens quite often, the crashes have been affecting our cloud over the weekend (which made some dev/ops not so happy especially due to the july 4th mini-vacation),<br>

<br>

Looking to see if anyone else has seen anything similar?<br>

<br>

For those interested this is the upstream bug/mail that I'm also seeing about getting confirmation from the upstream users/devs (which also has erlang crash dumps attached/linked),<br>

<br>

<a href="https://groups.google.com/forum/#!topic/rabbitmq-users/FeBK7iXUcLg" rel="noreferrer" target="_blank">https://groups.google.com/forum/#!topic/rabbitmq-users/FeBK7iXUcLg</a><br>

<br>

Thanks,<br>

<br>

-Josh<br>

<br>

_______________________________________________<br>

OpenStack-operators mailing list<br>

<a href="mailto:OpenStack-operators@lists.openstack.org" target="_blank">OpenStack-operators@lists.openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>

</blockquote>

</div>

</div>

</div>

</span>

</body>

</html>