[openstack-dev] [AODH] event-alarm timeout discussion

Zhai, Edwin edwin.zhai at intel.com
Mon Sep 26 01:48:55 UTC 2016


On Fri, 23 Sep 2016, gordon chung wrote:

>
>
> On 23/09/2016 2:18 AM, Zhai, Edwin wrote:
>
>>
>> There are many targets(topics)/endpoints in above ceilometer code. But
>> in AODH, we just have one topic, 'alarm.all', and one endpoint. If it is
>> still multi-threaded, there is already potential race condition here,
>> but event-alarm tiemout make it worse.
>>
>> https://github.com/openstack/aodh/blob/master/aodh/event.py#L61-L63
>
> see my reply to other message, but yes, it is multithreaded. there's not
> race currently because we don't do anything that needs to honour ordering.

Currently, we still need ordering. e.g.
2 events with different traits could trigger same alarm. If they come in an 
interval big enough, the alarm would be triggered once(Second event see the 
state as 'ALARM' and give up).  If they come and is handled concurrently, the 
alarm possibly be triggered twice(Both event see the state as 'UNKNOWN').  This 
is wrong as event alarm is one-shot(if repeat_actions=False).

Do you have any idea to resolve this race condition?

>
>>
>> event evaluator is triggered by event only, that is, it's not called at
>> all until next event comes. If no event comes, evaluator just sleeps so
>> that can't check timeout and update_alarm. In other words, 'timeout.end'
>> is just for waking up evaluator.
>>
>
> what's the purpose of the thread being created? i thought the idea was
> to receive alarm.timeout.start event -> creates a thread? can we not:
> 1. receive alarm.timeout.start -> create an alarm with timeout thread
> 2a. if event received, kill timeout thread, update alarm.
> 2b. if timeout reached, send alarm notification, update alarm.
>
> ^ that is just a random thought, i didn't think about exactly how to
> implement. right now i'm not clear who is generating this
> alarm.timeout.end event and why it needs to do that at all.


It's good idea! We need one way for timeout calculation: new thread, or alarm 
signal. If alarm signal is more stable, let's turn to it.

We need one list to keep all alarms waiting for timeout, and update the list 
when timeout signal reached.

alarm.timeout.end event is just for locking, and generated by new thread or 
alarm signal handler(your suggestion). If it is useless for locking, we can give 
up and just update alarm directly as you said.

>
> cheers,
> -- 
> gord
>

Best Rgds,
Edwin



More information about the OpenStack-dev mailing list