[ironic] [oslo] ironic overloading notifications for internal messaging

Michael Still mikal at stillhq.com
Tue Feb 5 22:07:29 UTC 2019


I'm also interested in how we catch future instances of this. Is there
something we can do in CI or in a runtime warning to let people know? I am
sure there are plenty of ironic deployments out there consuming heaps more
RAM than is required for this queue.

Michael

On Wed, Feb 6, 2019 at 8:41 AM Doug Hellmann <doug at doughellmann.com> wrote:

> Ken Giusti <kgiusti at gmail.com> writes:
>
> > On 2/4/19, Harald Jensås <hjensas at redhat.com> wrote:
> >> On Tue, 2019-02-05 at 09:54 +1100, Michael Still wrote:
> >>> Hi,
> >>>
> >>> I’ve been chasing a bug in ironic’s neutron agent for the last few
> >>> days and I think its time to ask for some advice.
> >>>
> >>
> >> I'm working on the same issue. (In fact there are two issues.)
> >>
> >>> Specifically, I was asked to debug why a set of controllers was using
> >>> so much RAM, and the answer was that rabbitmq had a queue called
> >>> ironic-neutron-agent-heartbeat.info with 800,000 messages enqueued.
> >>> This notification queue is used by ironic’s neutron agent to
> >>> calculate the hash ring. I have been able to duplicate this issue in
> >>> a stock kolla-ansible install with ironic turned on but no bare metal
> >>> nodes enrolled in ironic. About 0.6 messages are queued per second.
> >>>
> >>> I added some debugging code (hence the thread yesterday about
> >>> mangling the code kolla deploys), and I can see that the messages in
> >>> the queue are being read by the ironic neutron agent and acked
> >>> correctly. However, they are not removed from the queue.
> >>>
> >>> You can see your queue size while using kolla with this command:
> >>>
> >>> docker exec rabbitmq rabbitmqctl list_queues messages name
> >>> messages_ready consumers  | sort -n | tail -1
> >>>
> >>> My stock install that’s been running for about 12 hours currently has
> >>> 8,244 messages in that queue.
> >>>
> >>> Where I’m a bit stumped is I had assumed that the messages weren’t
> >>> being acked correctly, which is not the case. Is there something
> >>> obvious about notification queues like them being persistent that
> >>> I’ve missed in my general ignorance of the underlying implementation
> >>> of notifications?
> >>>
> >>
> >> I opened a oslo.messaging bug[1] yesterday. When using notifications
> >> and all consumers use one or more pools. The ironic-neutron-agent does
> >> use pools for all listeners in it's hash-ring member manager. And the
> >> result is that notifications are published to the 'ironic-neutron-
> >> agent-heartbeat.info' queue and they are never consumed.
> >>
> >
> > This is an issue with the design of the notification pool feature.
> >
> > The Notification service is designed so notification events can be
> > sent even though there may currently be no consumers.  It supports the
> > ability for events to be queued until a consumer(s) is ready to
> > process them.  So when a notifier issues an event and there are no
> > consumers subscribed, a queue must be provisioned to hold that event
> > until consumers appear.
>
> This has come up several times over the last few years, and it's always
> a surprise to whoever it has bitten. I wonder if we should change the
> default behavior to not create the consumer queue in the publisher?
>
> --
> Doug
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190206/76e9bc6c/attachment.html>


More information about the openstack-discuss mailing list