<div dir="ltr">Has anyone had any luck improving the statsdb issue by upgrading rabbit to 3.6.3 or newer? We're at 3.5.6 now and 3.6.2 has parallelized stats processing, then 3.6.3 has additional memory leak fixes for it. What we've been seeing is that we occasionally get slow & steady climbs of rabbit memory usage until the cluster falls over when it hits the memory limit. The last one occurred over 12 hours once we went back and looked at the charts.<div><br></div><div>I'm hoping to try 3.6.5 but we have no way to repro this outside of production and even there short of bouncing neutron and all the agents over and over I'm not sure I could recreate it.</div><div><br></div><div>Note - we already have the collect interval set to 30k, per recommendation from the Rabbit Ops talk in Tokyo, but no other optimizations for the statsdb. Some folks here are considering a cron job to bounce it every few hours.</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jul 28, 2016 at 9:10 AM, Kris G. Lindgren <span dir="ltr"><<a href="mailto:klindgren@godaddy.com" target="_blank">klindgren@godaddy.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="white" lang="EN-US" link="blue" vlink="purple">
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Calibri">We also believe the change from auto-delete queues to 10min expiration queues was the cause of our rabbit whoes a month or so ago. Where we had rabbitmq servers filling their stats DB
and consuming 20+ GB of ram before hitting the rabbitmq mem high watermark. We were running for 6+ months without issue under kilo and when we moved to Liberty rabbit consistently started falling on its face. We eventually turned down the stats collection
interval, but I would imagine keeping stats around for queue’s for 10 minutes that were used for a single RPC message when we are passing 1500+ messages per second wasn’t helping anything. We haven’t tried changing the timeout values to be lower, to see if
that made things better. But we did identify this change as something that could contribute to our rabbitmq issues.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Calibri"><u></u> <u></u></span></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Calibri;color:black"><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Calibri;color:black">______________________________<wbr>______________________________<wbr>_______<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Calibri;color:black">Kris Lindgren<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Calibri;color:black">Senior Linux Systems Engineer<u></u><u></u></span></p>
</div>
</div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Calibri;color:black">GoDaddy</span><span style="font-size:11.0pt;font-family:Calibri"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Calibri"><u></u> <u></u></span></p>
<div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-family:Calibri;color:black">From: </span>
</b><span style="font-family:Calibri;color:black">Dmitry Mescheryakov <<a href="mailto:dmescheryakov@mirantis.com" target="_blank">dmescheryakov@mirantis.com</a>><br>
<b>Date: </b>Thursday, July 28, 2016 at 6:17 AM<br>
<b>To: </b>Sam Morrison <<a href="mailto:sorrison@gmail.com" target="_blank">sorrison@gmail.com</a>><br>
<b>Cc: </b>OpenStack Operators <<a href="mailto:openstack-operators@lists.openstack.org" target="_blank">openstack-operators@lists.<wbr>openstack.org</a>><span class=""><br>
<b>Subject: </b>Re: [Openstack-operators] [oslo] RabbitMQ queue TTL issues moving to Liberty<u></u><u></u></span></span></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">2016-07-27 2:20 GMT+03:00 Sam Morrison <<a href="mailto:sorrison@gmail.com" target="_blank">sorrison@gmail.com</a>>:<u></u><u></u></p><div><div class="h5">
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal">On 27 Jul 2016, at 4:05 AM, Dmitry Mescheryakov <<a href="mailto:dmescheryakov@mirantis.com" target="_blank">dmescheryakov@mirantis.com</a>> wrote:<u></u><u></u></p>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">2016-07-26 2:15 GMT+03:00 Sam Morrison <<a href="mailto:sorrison@gmail.com" target="_blank">sorrison@gmail.com</a>>:<u></u><u></u></p>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<p class="MsoNormal">The queue TTL happens on reply queues and fanout queues. I don’t think it should happen on fanout queues. They should auto delete. I can understand the reason for having them on reply queues though so maybe that would be a way to forward?
<u></u><u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Or am I missing something and it is needed on fanout queues too?<u></u><u></u></p>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">I would say we do need fanout queues to expire for the very same reason we want reply queues to expire instead of auto delete. In case of broken connection, the expiration provides client time to reconnect and continue consuming from the
queue. In case of auto-delete queues, it was a frequent case that RabbitMQ deleted the queue before client reconnects ... along with all non-consumed messages in it.<u></u><u></u></p>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<p class="MsoNormal">But in the case of fanout queues, if there is a broken connection can’t the service just recreate the queue if it doesn’t exist? I guess that means it needs to store the state of what the queue name is though?<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Yes they could loose messages directed at them but all the services I know that consume on fanout queues have a re sync functionality for this very case.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">If the connection is broken will oslo messaging know how to connect to the same queue again anyway? I would’ve thought it would handle the disconnect and then reconnect, either with the same queue name or a new queue all together?<u></u><u></u></p>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">oslo.messaging handles reconnect perfectly - on connect it just unconditionally declares the queue and starts consuming from it. If queue already existed, the declaration operation will just be ignored by RabbitMQ.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">For your earlier point that services re sync and hence messages lost in fanout are not that important, I can't comment on that. But after some thinking I do agree that having big expiration time for fanouts is non-adequate for big deployments
anyway. How about we split <span style="font-size:9.5pt">rabbit_transient_queues_<wbr>ttl into two parameters - one for reply queue and one for fanout ones? In that case people concerned with messages piling up in fanouts might set it to 1, which will virtually
make these queues behave like auto-delete ones (though I strongly recommend to leave it at least at 20 seconds, to give service a chance to reconnect).</span><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Thanks,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Dmitry<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<div>
<p class="MsoNormal"><span style="color:#888888"><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="color:#888888">Sam<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="color:#888888"><u></u> <u></u></span></p>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</blockquote>
</div></div></div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
</div>
</div>
</div>
</div>
<br>______________________________<wbr>_________________<br>
OpenStack-operators mailing list<br>
<a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.<wbr>openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-operators</a><br>
<br></blockquote></div><br></div>