[openstack-dev] [Synaps] a more forgiving interpretation of INSUFFICIENT_DATA?
Deok-June Yi
june.yi at samsung.com
Thu Oct 25 13:11:56 UTC 2012
Hi Eoghan,
I could be wrong, but I guess that AWS CW might evaluate in batch
processing so that it could introduce unexpected leeway.
I prefer strict interpretation of INSUFFICIENT_DATA so that we can
predict behavior of the system. For large production deployments,
they should use long enough period of alarm.
And, I agree that current Synaps concept of evaluation period is
useful but making confusion. So I'll start to make it align with AWS
CW's concept. And switching to out-of-stream evaluation can make
Synaps evaluate lesser and will be helpful to reduce flapping.
Thank you,
June Yi
------- Original Message -------
Sender : Eoghan Glynn<eglynn at redhat.com>
Date : 2012-10-24 00:39 (GMT+09:00)
Title : [Synaps] a more forgiving interpretation of INSUFFICIENT_DATA?
Hi Synaps Folks,
IIUC the code, it seems Synaps does a pretty strict interpretation of
INSUFFICIENT_DATA, i.e. one of the evaluation periods without samples
is enough for this state transition to occur.
Whereas a little experimentation shows that CW is much more forgiving,
in the sense that it requires no samples whatsoever across the entire
evaluation periods, plus some leeway, in order for an alarm to flip
into this state.
So before investing time in preparing a patch, I was wondering if you
folks would be amenable to the idea of loosening this constraint?
The motivation would be to avoid causing lots of spurious state
transitions (and the associated actions firing) when there's a short
gap in the metric stream (due, say, to back-pressure in the AMQP
queues feeding the storm cluster). This I suspect might be a problem
in large production deployments.
The situation is complicated a bit by the Synaps concept of evaluation
period as overlapping windows offset by 60s (as discussed here[1]), so
a subsidiary question would be how wedded to that concept are you guys?
(BTW I can see how the moving-average style of interpretation might
useful, by smoothening out transient spikes and tending to reveal the
underlying trend. So I'm not questioning its utility, say as an
optional/additional set of statistics. However it seems to me that the
default behavior should follow the CW semantics.)
Cheers,
Eoghan
[1] https://answers.launchpad.net/synaps/+question/211970
More information about the OpenStack-dev
mailing list