[openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
Gordon Sim
gsim at redhat.com
Thu Jan 2 18:36:51 UTC 2014
On 12/20/2013 09:26 PM, Herndon, John Luke wrote:
>
> On Dec 20, 2013, at 12:13 PM, Gordon Sim <gsim at redhat.com> wrote:
>
>> On 12/20/2013 05:27 PM, Herndon, John Luke wrote:
>>>
>>> Other protocols may support bulk consumption. My one concern with
>>> this approach is error handling. Currently the executors treat
>>> each notification individually. So let’s say the broker hands
>>> 100 messages at a time. When client is done processing the
>>> messages, the broker needs to know if message 25 had an error or
>>> not. We would somehow need to communicate back to the broker
>>> which messages failed. I think this may take some refactoring of
>>> executors/dispatchers. What do you think?
[...]
>> (2) What would you want the broker to do with the failed messages?
>> What sort of things might fail? Is it related to the message
>> content itself? Or is it failures suspected to be of a temporal
>> nature?
>
> There will be situations where the message can’t be parsed, and those
> messages can’t just be thrown away. My current thought is that
> ceilometer could provide some sort of mechanism for sending messages
> that are invalid to an external data store (like a file, or a
> different topic on the amqp server) where a living, breathing human
> can look at them and try to parse out any meaningful information.
Right, in those cases simply requeueing probably is not the right thing
and you really want it dead-lettered in some way. I guess the first
question is whether that is part of the notification systems function,
or if it is done by the application itself (e.g. by storing it or
republishing it). If it is the latter you may not need any explicit
negative acknowledgement.
> Other errors might be “database not available”, in which case
> re-queing the message is probably the right way to go.
That does mean however that the backlog of messages starts to grow on
the broker, so some scheme for dealing with this if the database outage
goes on for a bit is probably important. It also means that the messages
will keep being retried without any 'backoff' waiting for the database
to be restored which could increase the load.
More information about the OpenStack-dev
mailing list