[openstack-dev] [nova] Rabbit-mq 3.4 crashing (anyone else seen this?)

Sam Morrison sorrison at gmail.com
Tue Jul 5 23:16:03 UTC 2016


We had some issues related to this too, we ended up changing our collect_statistics_interval to 30 seconds as opposed to the default which is 5 I think.

We also upgraded to 3.6.2 and that version is very buggy and wouldn’t recommend anyone to use it. It has a memory leak and some other nasty bugs we encountered.

3.6.1 on the other hand is very stable for us and we’ve been using it in production for several months now. 

Sam


> On 6 Jul 2016, at 3:50 AM, Alexey Lebedev <alebedev at mirantis.com> wrote:
> 
> Hi Joshua,
> 
> Does this happen with `rates_mode` set to `none` and tuned `collect_statistics_interval`? Like in https://bugs.launchpad.net/fuel/+bug/1510835 <https://bugs.launchpad.net/fuel/+bug/1510835>
> 
> High connection/channel churn during upgrade can cause such issues.
> 
> BTW, soon-to-be-released rabbitmq 3.6.3 contains several improvements related to management plugin statistics handling. And almost every version before that also contained some related fixes. And I think that upstream devs response will have some mention of upgrade =)
> 
> Best,
> Alexey
> 
> On Tue, Jul 5, 2016 at 8:02 PM, Joshua Harlow <harlowja at fastmail.com <mailto:harlowja at fastmail.com>> wrote:
> Hi ops and dev-folks,
> 
> We over at godaddy (running rabbitmq with openstack) have been hitting a issue that has been causing the `rabbit_mgmt_db` consuming nearly all the processes memory (after a given amount of time),
> 
> We've been thinking that this bug (or bugs?) may have existed for a while and our dual-version-path (where we upgrade the control plane and then slowly/eventually upgrade the compute nodes to the same version) has somehow triggered this memory leaking bug/issue since it has happened most prominently on our cloud which was running nova-compute at kilo and the other services at liberty (thus using the versioned objects code path more frequently due to needing translations of objects).
> 
> The rabbit we are running is 3.4.0 on CentOS Linux release 7.2.1511 with kernel 3.10.0-327.4.4.el7.x86_64 (do note that upgrading to 3.6.2 seems to make the issue go away),
> 
> # rpm -qa | grep rabbit
> 
> rabbitmq-server-3.4.0-1.noarch
> 
> The logs that seem relevant:
> 
> ```
> **********************************************************
> *** Publishers will be blocked until this alarm clears ***
> **********************************************************
> 
> =INFO REPORT==== 1-Jul-2016::16:37:46 ===
> accepting AMQP connection <0.23638.342> (127.0.0.1:51932 <http://127.0.0.1:51932/> -> 127.0.0.1:5671 <http://127.0.0.1:5671/>)
> 
> =INFO REPORT==== 1-Jul-2016::16:37:47 ===
> vm_memory_high_watermark clear. Memory used:29910180640 allowed:47126781542
> ```
> 
> This happens quite often, the crashes have been affecting our cloud over the weekend (which made some dev/ops not so happy especially due to the july 4th mini-vacation),
> 
> Looking to see if anyone else has seen anything similar?
> 
> For those interested this is the upstream bug/mail that I'm also seeing about getting confirmation from the upstream users/devs (which also has erlang crash dumps attached/linked),
> 
> https://groups.google.com/forum/#!topic/rabbitmq-users/FeBK7iXUcLg <https://groups.google.com/forum/#!topic/rabbitmq-users/FeBK7iXUcLg>
> 
> Thanks,
> 
> -Josh
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe <http://OpenStack-dev-request@lists.openstack.org/?subject:unsubscribe>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> 
> 
> 
> -- 
> Best,
> Alexey
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160706/cfaad5b0/attachment.html>


More information about the OpenStack-dev mailing list