[Openstack-operators] [nova] Rabbit-mq 3.4 crashing (anyone else seen this?)

Mike Lowe jomlowe at iu.edu
Tue Jul 5 17:24:22 UTC 2016


I was having just this problem last week.  We updated to 3.6.2 from 3.5.4 on ubuntu and stated seeing crashes due to excessive memory usage. I did this on each node of my rabbit cluster and haven’t had any problems since 'rabbitmq-plugins disable rabbitmq_management’.  From what I could gather from rabbitmq mailing lists the stats collection part of the management console is single threaded and can’t keep up thus the ever growing memory usage from the ever growing backlog of stats to be processed.


> On Jul 5, 2016, at 1:02 PM, Joshua Harlow <harlowja at fastmail.com> wrote:
> 
> Hi ops and dev-folks,
> 
> We over at godaddy (running rabbitmq with openstack) have been hitting a issue that has been causing the `rabbit_mgmt_db` consuming nearly all the processes memory (after a given amount of time),
> 
> We've been thinking that this bug (or bugs?) may have existed for a while and our dual-version-path (where we upgrade the control plane and then slowly/eventually upgrade the compute nodes to the same version) has somehow triggered this memory leaking bug/issue since it has happened most prominently on our cloud which was running nova-compute at kilo and the other services at liberty (thus using the versioned objects code path more frequently due to needing translations of objects).
> 
> The rabbit we are running is 3.4.0 on CentOS Linux release 7.2.1511 with kernel 3.10.0-327.4.4.el7.x86_64 (do note that upgrading to 3.6.2 seems to make the issue go away),
> 
> # rpm -qa | grep rabbit
> 
> rabbitmq-server-3.4.0-1.noarch
> 
> The logs that seem relevant:
> 
> ```
> **********************************************************
> *** Publishers will be blocked until this alarm clears ***
> **********************************************************
> 
> =INFO REPORT==== 1-Jul-2016::16:37:46 ===
> accepting AMQP connection <0.23638.342> (127.0.0.1:51932 -> 127.0.0.1:5671)
> 
> =INFO REPORT==== 1-Jul-2016::16:37:47 ===
> vm_memory_high_watermark clear. Memory used:29910180640 allowed:47126781542
> ```
> 
> This happens quite often, the crashes have been affecting our cloud over the weekend (which made some dev/ops not so happy especially due to the july 4th mini-vacation),
> 
> Looking to see if anyone else has seen anything similar?
> 
> For those interested this is the upstream bug/mail that I'm also seeing about getting confirmation from the upstream users/devs (which also has erlang crash dumps attached/linked),
> 
> https://groups.google.com/forum/#!topic/rabbitmq-users/FeBK7iXUcLg
> 
> Thanks,
> 
> -Josh
> 
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2312 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20160705/f26bfdd7/attachment.bin>


More information about the OpenStack-operators mailing list