[ironic] [oslo] ironic overloading notifications for internal messaging

Ken Giusti kgiusti at gmail.com
Wed Feb 6 15:00:08 UTC 2019


On 2/5/19, Doug Hellmann <doug at doughellmann.com> wrote:
> Ken Giusti <kgiusti at gmail.com> writes:
>
>> On 2/4/19, Harald Jensås <hjensas at redhat.com> wrote:
>>> On Tue, 2019-02-05 at 09:54 +1100, Michael Still wrote:
>>>> Hi,
>>>>
>>>> I’ve been chasing a bug in ironic’s neutron agent for the last few
>>>> days and I think its time to ask for some advice.
>>>>
>>>
>>> I'm working on the same issue. (In fact there are two issues.)
>>>
>>>> Specifically, I was asked to debug why a set of controllers was using
>>>> so much RAM, and the answer was that rabbitmq had a queue called
>>>> ironic-neutron-agent-heartbeat.info with 800,000 messages enqueued.
>>>> This notification queue is used by ironic’s neutron agent to
>>>> calculate the hash ring. I have been able to duplicate this issue in
>>>> a stock kolla-ansible install with ironic turned on but no bare metal
>>>> nodes enrolled in ironic. About 0.6 messages are queued per second.
>>>>
>>>> I added some debugging code (hence the thread yesterday about
>>>> mangling the code kolla deploys), and I can see that the messages in
>>>> the queue are being read by the ironic neutron agent and acked
>>>> correctly. However, they are not removed from the queue.
>>>>
>>>> You can see your queue size while using kolla with this command:
>>>>
>>>> docker exec rabbitmq rabbitmqctl list_queues messages name
>>>> messages_ready consumers  | sort -n | tail -1
>>>>
>>>> My stock install that’s been running for about 12 hours currently has
>>>> 8,244 messages in that queue.
>>>>
>>>> Where I’m a bit stumped is I had assumed that the messages weren’t
>>>> being acked correctly, which is not the case. Is there something
>>>> obvious about notification queues like them being persistent that
>>>> I’ve missed in my general ignorance of the underlying implementation
>>>> of notifications?
>>>>
>>>
>>> I opened a oslo.messaging bug[1] yesterday. When using notifications
>>> and all consumers use one or more pools. The ironic-neutron-agent does
>>> use pools for all listeners in it's hash-ring member manager. And the
>>> result is that notifications are published to the 'ironic-neutron-
>>> agent-heartbeat.info' queue and they are never consumed.
>>>
>>
>> This is an issue with the design of the notification pool feature.
>>
>> The Notification service is designed so notification events can be
>> sent even though there may currently be no consumers.  It supports the
>> ability for events to be queued until a consumer(s) is ready to
>> process them.  So when a notifier issues an event and there are no
>> consumers subscribed, a queue must be provisioned to hold that event
>> until consumers appear.
>
> This has come up several times over the last few years, and it's always
> a surprise to whoever it has bitten. I wonder if we should change the
> default behavior to not create the consumer queue in the publisher?
>

+1

One possibility is to provide options on the Notifier constructor
allowing the app to control the queue creation behavior.  Something
like "create_queue=True/False".

We can document this as a 'dead letter' queue feature for events
published w/o active listeners.

> --
> Doug
>


-- 
Ken Giusti  (kgiusti at gmail.com)



More information about the openstack-discuss mailing list