On Tue, 2019-02-05 at 11:43 -0500, Ken Giusti wrote:
On 2/4/19, Harald Jensås <hjensas@redhat.com> wrote:
I opened a oslo.messaging bug[1] yesterday. When using notifications and all consumers use one or more pools. The ironic-neutron-agent does use pools for all listeners in it's hash-ring member manager. And the result is that notifications are published to the 'ironic-neutron- agent-heartbeat.info' queue and they are never consumed.
This is an issue with the design of the notification pool feature.
The Notification service is designed so notification events can be sent even though there may currently be no consumers. It supports the ability for events to be queued until a consumer(s) is ready to process them. So when a notifier issues an event and there are no consumers subscribed, a queue must be provisioned to hold that event until consumers appear.
For notification pools the pool identifier is supplied by the notification listener when it subscribes. The value of any pool id is not known beforehand by the notifier, which is important because pool ids can be dynamically created by the listeners. And in many cases pool ids are not even used.
So notifications are always published to a non-pooled queue. If there are pooled subscriptions we rely on the broker to do the fanout. This means that the application should always have at least one non-pooled listener for the topic, since any events that may be published _before_ the listeners are established will be stored on a non-pooled queue.
From what I observer any message published _before_ or _after_ pool
listeners are established are stored on the non-pooled queue.
The documentation doesn't make that clear AFAIKT - that needs to be fixed.
I agree with your conclusion here. This is not clear in the documentation. And it should be updated to reflect the requirement of at least one non-pool listener to consume the non-pooled queue.
The second issue, each instance of the agent uses it's own pool to ensure all agents are notified about the existance of peer-agents. The pools use a uuid that is generated at startup (and re-generated on restart, stop/start etc). In the case where `[oslo_messaging_rabbit]/amqp_auto_delete = false` in neutron config these uuid queues are not automatically removed. So after a restart of the ironic-neutron-agent the queue with the old UUID is left in the message broker without no consumers, growing ...
I intend to push patches to fix both issues. As a workaround (or the permanent solution) will create another listener consuming the notifications without a pool. This should fix the first issue.
Second change will set amqp_auto_delete for these specific queues to 'true' no matter. What I'm currently stuck on here is that I need to change the control_exchange for the transport. According to oslo.messaging documentation it should be possible to override the control_exchange in the transport_url[3]. The idea is to set amqp_auto_delete and a ironic-neutron-agent specific exchange on the url when setting up the transport for notifications, but so far I belive the doc string on the control_exchange option is wrong.
Yes the doc string is wrong - you can override the default control_exchange via the Target's exchange field:
https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo_messaging/...
At least that's the intent...
... however the Notifier API does not take a Target, it takes a list of topic _strings_:
https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo_messaging/...
Which seems wrong, especially since the notification Listener subscribes to a list of Targets:
https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo_messaging/...
I've opened a bug for this and will provide a patch for review shortly:
Thanks, this makes sense. One question, in target I can see that there is the 'fanout' parameter. https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo_messaging/... """ Clients may request that a copy of the message be delivered to all servers listening on a topic by setting fanout to ``True``, rather than just one of them. """ In my usecase I actually want exactly that. So once your patch lands I can drop the use of pools and just set fanout=true on the target instead?
NOTE: The second issue can be worked around by stopping and starting rabbitmq as a dependency of the ironic-neutron-agent service. This ensure only queues for active agent uuid's are present, and those queues will be consumed.
-- Harald Jensås
[1] https://bugs.launchpad.net/oslo.messaging/+bug/1814544 [2] https://storyboard.openstack.org/#!/story/2004933 [3]
https://github.com/openstack/oslo.messaging/blob/master/oslo_messaging/trans...