[ironic] [oslo] ironic overloading notifications for internal messaging

Doug Hellmann doug at doughellmann.com
Tue Feb 5 21:35:04 UTC 2019


Ken Giusti <kgiusti at gmail.com> writes:

> On 2/4/19, Harald Jensås <hjensas at redhat.com> wrote:
>> On Tue, 2019-02-05 at 09:54 +1100, Michael Still wrote:
>>> Hi,
>>>
>>> I’ve been chasing a bug in ironic’s neutron agent for the last few
>>> days and I think its time to ask for some advice.
>>>
>>
>> I'm working on the same issue. (In fact there are two issues.)
>>
>>> Specifically, I was asked to debug why a set of controllers was using
>>> so much RAM, and the answer was that rabbitmq had a queue called
>>> ironic-neutron-agent-heartbeat.info with 800,000 messages enqueued.
>>> This notification queue is used by ironic’s neutron agent to
>>> calculate the hash ring. I have been able to duplicate this issue in
>>> a stock kolla-ansible install with ironic turned on but no bare metal
>>> nodes enrolled in ironic. About 0.6 messages are queued per second.
>>>
>>> I added some debugging code (hence the thread yesterday about
>>> mangling the code kolla deploys), and I can see that the messages in
>>> the queue are being read by the ironic neutron agent and acked
>>> correctly. However, they are not removed from the queue.
>>>
>>> You can see your queue size while using kolla with this command:
>>>
>>> docker exec rabbitmq rabbitmqctl list_queues messages name
>>> messages_ready consumers  | sort -n | tail -1
>>>
>>> My stock install that’s been running for about 12 hours currently has
>>> 8,244 messages in that queue.
>>>
>>> Where I’m a bit stumped is I had assumed that the messages weren’t
>>> being acked correctly, which is not the case. Is there something
>>> obvious about notification queues like them being persistent that
>>> I’ve missed in my general ignorance of the underlying implementation
>>> of notifications?
>>>
>>
>> I opened a oslo.messaging bug[1] yesterday. When using notifications
>> and all consumers use one or more pools. The ironic-neutron-agent does
>> use pools for all listeners in it's hash-ring member manager. And the
>> result is that notifications are published to the 'ironic-neutron-
>> agent-heartbeat.info' queue and they are never consumed.
>>
>
> This is an issue with the design of the notification pool feature.
>
> The Notification service is designed so notification events can be
> sent even though there may currently be no consumers.  It supports the
> ability for events to be queued until a consumer(s) is ready to
> process them.  So when a notifier issues an event and there are no
> consumers subscribed, a queue must be provisioned to hold that event
> until consumers appear.

This has come up several times over the last few years, and it's always
a surprise to whoever it has bitten. I wonder if we should change the
default behavior to not create the consumer queue in the publisher?

-- 
Doug



More information about the openstack-discuss mailing list