[openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

Herndon, John Luke john.herndon at hp.com
Thu Jan 2 22:46:43 UTC 2014



On 1/2/14, 11:36 AM, "Gordon Sim" <gsim at redhat.com> wrote:

>On 12/20/2013 09:26 PM, Herndon, John Luke wrote:
>>
>> On Dec 20, 2013, at 12:13 PM, Gordon Sim <gsim at redhat.com> wrote:
>>
>>> On 12/20/2013 05:27 PM, Herndon, John Luke wrote:
>>>>
>>>> Other protocols may support bulk consumption. My one concern with
>>>> this approach is error handling. Currently the executors treat
>>>> each notification individually. So let¹s say the broker hands
>>>> 100 messages at a time. When client is done processing the
>>>> messages, the broker needs to know if message 25 had an error or
>>>> not. We would somehow need to communicate back to the broker
>>>> which messages failed. I think this may take some refactoring of
>>>> executors/dispatchers. What do you think?
>[...]
>>> (2) What would you want the broker to do with the failed messages?
>>> What sort of things might fail? Is it related to the message
>>> content itself? Or is it failures suspected to be of a temporal
>>> nature?
> >
>> There will be situations where the message can¹t be parsed, and those
>> messages can¹t just be thrown away. My current thought is that
>> ceilometer could provide some sort of mechanism for sending messages
>> that are invalid to an external data store (like a file, or a
>> different topic on the amqp server) where a living, breathing human
>> can look at them and try to parse out any meaningful information.
>
>Right, in those cases simply requeueing probably is not the right thing
>and you really want it dead-lettered in some way. I guess the first
>question is whether that is part of the notification systems function,
>or if it is done by the application itself (e.g. by storing it or
>republishing it). If it is the latter you may not need any explicit
>negative acknowledgement.

Exactly, I¹m thinking this is something we¹d build into ceilometer and not
oslo, since ceilometer is where the event parsing knowledge lives. From an
oslo point of view, the message would be 'acked¹.

>
>> Other errors might be ³database not available², in which case
>> re-queing the message is probably the right way to go.
>
>That does mean however that the backlog of messages starts to grow on
>the broker, so some scheme for dealing with this if the database outage
>goes on for a bit is probably important. It also means that the messages
> 
>will keep being retried without any 'backoff' waiting for the database
>to be restored which could increase the load.

This is a problem we already have :(
https://github.com/openstack/ceilometer/blob/master/ceilometer/notification
.py#L156-L158
Since notifications cannot be lost, overflow needs to be detected and the
messages need to be saved. I¹m thinking the database being down is a rare
occurrence that will be worthy of waking someone up in the middle of the
night. One possible solution: flip the collector into an emergency mode
and save notifications to disc until the issue is resolved. Once the db is
up and running, the collector inserts all of these saved messages (as one
big batch!). Thoughts?

I¹m not sure I understand what you are saying about retrying without a
backoff. Can you explain?

-john

>
>
>
>_______________________________________________
>OpenStack-dev mailing l
>OpenStack-dev at lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5443 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140102/969a4b4b/attachment.bin>


More information about the OpenStack-dev mailing list