Re: [ironic] [oslo] ironic overloading notifications for internal messaging

5 Feb 2019

      On 2/5/19, Harald Jensås <hjensas@redhat.com> wrote:
...
On Tue, 2019-02-05 at 11:43 -0500, Ken Giusti wrote:
...
On 2/4/19, Harald Jensås <hjensas@redhat.com> wrote:
...
I opened a oslo.messaging bug[1] yesterday. When using
notifications
and all consumers use one or more pools. The ironic-neutron-agent
does
use pools for all listeners in it's hash-ring member manager. And
the
result is that notifications are published to the 'ironic-neutron-
agent-heartbeat.info' queue and they are never consumed.
This is an issue with the design of the notification pool feature.
The Notification service is designed so notification events can be
sent even though there may currently be no consumers.  It supports
the
ability for events to be queued until a consumer(s) is ready to
process them.  So when a notifier issues an event and there are no
consumers subscribed, a queue must be provisioned to hold that event
until consumers appear.
For notification pools the pool identifier is supplied by the
notification listener when it subscribes.  The value of any pool id
is
not known beforehand by the notifier, which is important because pool
ids can be dynamically created by the listeners.  And in many cases
pool ids are not even used.
So notifications are always published to a non-pooled queue.  If
there
are pooled subscriptions we rely on the broker to do the fanout.
This means that the application should always have at least one
non-pooled listener for the topic, since any events that may be
published _before_ the listeners are established will be stored on a
non-pooled queue.
From what I observer any message published _before_ or _after_ pool
listeners are established are stored on the non-pooled queue.
True that.  Even if listeners are established before a notification is
issued the notifier still doesn't know that and blindly creates a non
pooled queue just in case there aren't any listeners.

Not intuitive I agree.
...
...
The documentation doesn't make that clear AFAIKT - that needs to be
fixed.
I agree with your conclusion here. This is not clear in the
documentation. And it should be updated to reflect the requirement of
at least one non-pool listener to consume the non-pooled queue.
+1 I can do that.
...
...
...
The second issue, each instance of the agent uses it's own pool to
ensure all agents are notified about the existance of peer-agents.
The
pools use a uuid that is generated at startup (and re-generated on
restart, stop/start etc). In the case where
`[oslo_messaging_rabbit]/amqp_auto_delete = false` in neutron
config
these uuid queues are not automatically removed. So after a restart
of
the ironic-neutron-agent the queue with the old UUID is left in the
message broker without no consumers, growing ...
I intend to push patches to fix both issues. As a workaround (or
the
permanent solution) will create another listener consuming the
notifications without a pool. This should fix the first issue.
Second change will set amqp_auto_delete for these specific queues
to
'true' no matter. What I'm currently stuck on here is that I need
to
change the control_exchange for the transport. According to
oslo.messaging documentation it should be possible to override the
control_exchange in the transport_url[3]. The idea is to set
amqp_auto_delete and a ironic-neutron-agent specific exchange on
the
url when setting up the transport for notifications, but so far I
belive the doc string on the control_exchange option is wrong.
Yes the doc string is wrong - you can override the default
control_exchange via the Target's exchange field:
https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo_messaging/...
...
At least that's the intent...
... however the Notifier API does not take a Target, it takes a list
of topic _strings_:
https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo_messaging/...
...
Which seems wrong, especially since the notification Listener
subscribes to a list of Targets:
https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo_messaging/...
...
I've opened a bug for this and will provide a patch for review
shortly:
https://bugs.launchpad.net/oslo.messaging/+bug/1814797
Thanks, this makes sense.
I've hacked in the ability to override the default exchange for
notifiers, but I don't think it would help in your case.  In rabbitmq
exchange and queue names are scoped independently.  This means that if
you have an exchange named "openstack' and another named 'my-exchange'
but use the same topic (say 'foo') you end up with a single instance
of queue 'foo' bound to both exchanges.  IOW declaring one listener on
exchange=openstack and topic=foo, and another listener on
exchange=my-exchange and topic=foo they will compete for messages
because they are consuming from the same queue (foo).  So if your
intent is to partition notification traffic you'd still need unique
topics as well.
...
One question, in target I can see that there is the 'fanout' parameter.
https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo_messaging/...
""" Clients may request that a copy of the message be delivered to all
servers listening on a topic by setting fanout to ``True``, rather than
just one of them. """
In my usecase I actually want exactly that. So once your patch lands I
can drop the use of pools and just set fanout=true on the target
instead?
The 'fanout' attribute is only used with RPC messaging, not Notifications.

Can you use RPC fanout instead of Notifications?  RPC fanout ('cast'
as the API calls it) is different from 'normal' RPC in that no reply
is returned to the caller.  So it's a lot like Notifications in that
regard.  However RPC fanout is different from Notifications in two
important ways:  1) RPC fanout messages are sent 'least effort',
meaning they can be silently discarded, and 2) RPC fanout messages are
not stored - they are only delivered to active subscribers
(listeners).

I've always felt that notification pools are an attempt to implement a
Publish/Subscribe messaging pattern on top of an event queuing
service.   That's hard to do since event queuing has strict delivery
guarantees (avoid dropping) which Pub/Sub doesn't (drop if no
consumers).
...
...
...
NOTE: The second issue can be worked around by stopping and
starting
rabbitmq as a dependency of the ironic-neutron-agent service. This
ensure only queues for active agent uuid's are present, and those
queues will be consumed.
--
Harald Jensås
[1] https://bugs.launchpad.net/oslo.messaging/+bug/1814544
[2] https://storyboard.openstack.org/#!/story/2004933
[3]
https://github.com/openstack/oslo.messaging/blob/master/oslo_messaging/trans...
...
...
-- 
Ken Giusti  (kgiusti@gmail.com)