[Openstack-operators] RabbitMQ 3.6.x experience?

Sam Morrison sorrison at gmail.com
Tue Jan 10 23:08:16 UTC 2017


> On 10 Jan 2017, at 11:04 pm, Tomáš Vondra <vondra at homeatcloud.cz> wrote:
> 
> The version is 3.6.2, but the issue that I believe is relevant is still not fixed:
> https://github.com/rabbitmq/rabbitmq-management/issues/41
> Tomas
> 

Yeah we found this version unusable, 3.6.5 hasn’t had any problems for us.

Sam



> -----Original Message-----
> From: Mike Dorman [mailto:mdorman at godaddy.com] 
> Sent: Monday, January 09, 2017 6:00 PM
> To: Ricardo Rocha; Sam Morrison
> Cc: OpenStack Operators
> Subject: Re: [Openstack-operators] RabbitMQ 3.6.x experience?
> 
> Great info, thanks so much for this.  We, too, have turned off stats collection some time ago (and haven’t really missed it.)
> 
> Tomáš, what minor version of 3.6 are you using?  We would probably go to 3.6.6 if we upgrade.
> 
> Thanks again all!
> Mike
> 
> 
> On 1/9/17, 2:34 AM, "Ricardo Rocha" <rocha.porto at gmail.com> wrote:
> 
>    Same here, running 3.6.5 for (some) of the rabbit clusters.
> 
>    It's been stable over the last month (fingers crossed!), though:
>    * gave up on stats collection (set to 60000 which makes it not so useful)
>    * can still make it very sick with a couple of misconfigured clients
>    (rabbit_retry_interval=1 and rabbit_retry_backoff=60 currently
>    everywhere).
> 
>    Some data from the neutron rabbit cluster (3 vm nodes, not all infra
>    currently talks to neutron):
> 
>    * connections: ~8k
>    * memory used per node: 2.5GB, 1.7GB, 0.1GB (the last one is less used
>    due to a previous net partition i believe)
>    * rabbit hiera configuration
>    rabbitmq::cluster_partition_handling: 'autoheal'
>    rabbitmq::config_kernel_variables:
>      inet_dist_listen_min: 41055
>      inet_dist_listen_max: 41055
>    rabbitmq::config_variables:
>      collect_statistics_interval: 60000
>      reverse_dns_lookups: true
>      vm_memory_high_watermark: 0.8
>    rabbitmq::environment_variables:
>      SERVER_ERL_ARGS: "'+K true +A 128 +P 1048576'"
>    rabbitmq::tcp_keepalive: true
>    rabbitmq::tcp_backlog: 4096
> 
>    * package versions
> 
>    erlang-kernel-18.3.4.4-1
>    rabbitmq-server-3.6.5-1
> 
>    It's stable enough to keep scaling it up in the next couple months and
>    see how it goes.
> 
>    Cheers,
>      Ricardo
> 
>    On Mon, Jan 9, 2017 at 3:54 AM, Sam Morrison <sorrison at gmail.com> wrote:
>> We’ve been running 3.6.5 for sometime now and it’s working well.
>> 
>> 3.6.1 - 3.6.3 are unusable, we had lots of issues with stats DB and other
>> weirdness.
>> 
>> Our setup is a 3 physical node cluster with around 9k connections, average
>> around the 300 messages/sec delivery. We have the stats sample rate set to
>> default and it is working fine.
>> 
>> Yes we did have to restart the cluster to upgrade.
>> 
>> Cheers,
>> Sam
>> 
>> 
>> 
>> On 6 Jan 2017, at 5:26 am, Matt Fischer <matt at mattfischer.com> wrote:
>> 
>> MIke,
>> 
>> I did a bunch of research and experiments on this last fall. We are running
>> Rabbit 3.5.6 on our main cluster and 3.6.5 on our Trove cluster which has
>> significantly less load (and criticality). We were going to upgrade to 3.6.5
>> everywhere but in the end decided not to, mainly because there was little
>> perceived benefit at the time. Our main issue is unchecked memory growth at
>> random times. I ended up making several config changes to the stats
>> collector and then we also restart it after every deploy and that solved it
>> (so far).
>> 
>> I'd say these were my main reasons for not going to 3.6 for our control
>> nodes:
>> 
>> In 3.6.x they re-wrote the stats processor to make it parallel. In every 3.6
>> release since then, Pivotal has fixed bugs in this code. Then finally they
>> threw up their hands and said "we're going to make a complete rewrite in
>> 3.7/4.x" (you need to look through issues on Github to find this discussion)
>> Out of the box with the same configs 3.6.5 used more memory than 3.5.6,
>> since this was our main issue, I consider this a negative.
>> Another issue is the ancient version of erlang we have with Ubuntu Trusty
>> (which we are working on) which made upgrades more complex/impossible
>> depending on the version.
>> 
>> Given those negatives, the main one being that I didn't think there would be
>> too many more fixes to the parallel statsdb collector in 3.6, we decided to
>> stick with 3.5.6. In the end the devil we know is better than the devil we
>> don't and I had no evidence that 3.6.5 would be an improvement.
>> 
>> I did decide to leave Trove on 3.6.5 because this would give us some bake-in
>> time if 3.5.x became untenable we'd at least have had it up and running in
>> production and some data on it.
>> 
>> If statsdb is not a concern for you, I think this changes the math and maybe
>> you should use 3.6.x. I would however recommend at least going to 3.5.6,
>> it's been better than 3.3/3.4 was.
>> 
>> No matter what you do definitely read all the release notes. There are some
>> upgrades which require an entire cluster shutdown. The upgrade to 3.5.6 did
>> not require this IIRC.
>> 
>> Here's the hiera for our rabbit settings which I assume you can translate:
>> 
>> rabbitmq::cluster_partition_handling: 'autoheal'
>> rabbitmq::config_variables:
>>  'vm_memory_high_watermark': '0.6'
>>  'collect_statistics_interval': 30000
>> rabbitmq::config_management_variables:
>>  'rates_mode': 'none'
>> rabbitmq::file_limit: '65535'
>> 
>> Finally, if you do upgrade to 3.6.x please report back here with your
>> results at scale!
>> 
>> 
>> On Thu, Jan 5, 2017 at 8:49 AM, Mike Dorman <mdorman at godaddy.com> wrote:
>>> 
>>> We are looking at upgrading to the latest RabbitMQ in an effort to ease
>>> some cluster failover issues we’ve been seeing.  (Currently on 3.4.0)
>>> 
>>> 
>>> 
>>> Anyone been running 3.6.x?  And what has been your experience?  Any
>>> gottchas to watch out for?
>>> 
>>> 
>>> 
>>> Thanks,
>>> 
>>> Mike
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>> 
>> 
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>> 
>> 
>> 
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>> 
> 
>    _______________________________________________
>    OpenStack-operators mailing list
>    OpenStack-operators at lists.openstack.org
>    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> 
> 
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> 
> 
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




More information about the OpenStack-operators mailing list