<div dir="ltr">I'm also interested in how we catch future instances of this. Is there something we can do in CI or in a runtime warning to let people know? I am sure there are plenty of ironic deployments out there consuming heaps more RAM than is required for this queue.<div><br></div><div>Michael</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 6, 2019 at 8:41 AM Doug Hellmann <<a href="mailto:doug@doughellmann.com">doug@doughellmann.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Ken Giusti <<a href="mailto:kgiusti@gmail.com" target="_blank">kgiusti@gmail.com</a>> writes:<br>

<br>

> On 2/4/19, Harald Jensås <<a href="mailto:hjensas@redhat.com" target="_blank">hjensas@redhat.com</a>> wrote:<br>

>> On Tue, 2019-02-05 at 09:54 +1100, Michael Still wrote:<br>

>>> Hi,<br>

>>><br>

>>> I’ve been chasing a bug in ironic’s neutron agent for the last few<br>

>>> days and I think its time to ask for some advice.<br>

>>><br>

>><br>

>> I'm working on the same issue. (In fact there are two issues.)<br>

>><br>

>>> Specifically, I was asked to debug why a set of controllers was using<br>

>>> so much RAM, and the answer was that rabbitmq had a queue called<br>

>>> <a href="http://ironic-neutron-agent-heartbeat.info" rel="noreferrer" target="_blank">ironic-neutron-agent-heartbeat.info</a> with 800,000 messages enqueued.<br>

>>> This notification queue is used by ironic’s neutron agent to<br>

>>> calculate the hash ring. I have been able to duplicate this issue in<br>

>>> a stock kolla-ansible install with ironic turned on but no bare metal<br>

>>> nodes enrolled in ironic. About 0.6 messages are queued per second.<br>

>>><br>

>>> I added some debugging code (hence the thread yesterday about<br>

>>> mangling the code kolla deploys), and I can see that the messages in<br>

>>> the queue are being read by the ironic neutron agent and acked<br>

>>> correctly. However, they are not removed from the queue.<br>

>>><br>

>>> You can see your queue size while using kolla with this command:<br>

>>><br>

>>> docker exec rabbitmq rabbitmqctl list_queues messages name<br>

>>> messages_ready consumers  | sort -n | tail -1<br>

>>><br>

>>> My stock install that’s been running for about 12 hours currently has<br>

>>> 8,244 messages in that queue.<br>

>>><br>

>>> Where I’m a bit stumped is I had assumed that the messages weren’t<br>

>>> being acked correctly, which is not the case. Is there something<br>

>>> obvious about notification queues like them being persistent that<br>

>>> I’ve missed in my general ignorance of the underlying implementation<br>

>>> of notifications?<br>

>>><br>

>><br>

>> I opened a oslo.messaging bug[1] yesterday. When using notifications<br>

>> and all consumers use one or more pools. The ironic-neutron-agent does<br>

>> use pools for all listeners in it's hash-ring member manager. And the<br>

>> result is that notifications are published to the 'ironic-neutron-<br>

>> <a href="http://agent-heartbeat.info" rel="noreferrer" target="_blank">agent-heartbeat.info</a>' queue and they are never consumed.<br>

>><br>

><br>

> This is an issue with the design of the notification pool feature.<br>

><br>

> The Notification service is designed so notification events can be<br>

> sent even though there may currently be no consumers.  It supports the<br>

> ability for events to be queued until a consumer(s) is ready to<br>

> process them.  So when a notifier issues an event and there are no<br>

> consumers subscribed, a queue must be provisioned to hold that event<br>

> until consumers appear.<br>

<br>

This has come up several times over the last few years, and it's always<br>

a surprise to whoever it has bitten. I wonder if we should change the<br>

default behavior to not create the consumer queue in the publisher?<br>

<br>

-- <br>

Doug<br>

<br>

</blockquote></div>