[openstack-dev] [Horizon] [UX] Design for Alarming and Alarm Management

Eoghan Glynn eglynn at redhat.com
Mon Jun 16 14:56:20 UTC 2014


Apologies for the top-posting, but just wanted to call out some
potential confusion that arose on the #os-ceilometer channel earlier
today.

TL;DR: the UI shouldn't assume a 1:1 mapping between alarms and
       resources, since this mapping does not exist in general

Background: See ML post[1]

Discussion: See IRC log [2]
            Ctrl+F: "Let's see what the UI guys think about it"

Cheers,
Eoghan

[1] http://lists.openstack.org/pipermail/openstack-dev/2014-June/037788.html
[2] http://eavesdrop.openstack.org/irclogs/%23openstack-ceilometer/%23openstack-ceilometer.2014-06-16.log


----- Original Message -----
> Hi all,
> 
> Thanks again for the great comments on the initial cut of wireframes. I’ve
> updated them a fair amount based on feedback in this e-mail thread along
> with the feedback written up here:
> https://etherpad.openstack.org/p/alarm-management-page-design-discussion
> 
> Here is a link to the new version:
> http://people.redhat.com/~lsurette/OpenStack/Alarm%20Management%20-%202014-06-05.pdf
> 
> And a quick explanation of the updates that I made from the last version:
> 
> 1) Removed severity.
> 
> 2) Added Status column. I also added details around the fact that users can
> enable/disable alerts.
> 
> 3) Updated Alarm creation workflow to include choosing the project and user
> (optionally for filtering the resource list), choosing resource, and
> allowing for choose of amount of time to monitor for alarming.
>      -Perhaps we could be even more sophisticated for how we let users filter
>      down to find the right resources that they want to monitor for alarms?
> 
> 4) As for notifying users…I’ve updated the “Alarms” section to be “Alarms
> History”. The point here is to show any Alarms that have occurred to notify
> the user. Other notification ideas could be to allow users to get notified
> of alerts via e-mail (perhaps a user setting?). I’ve added a wireframe for
> this update in User Settings. Then the Alarms Management section would just
> be where the user creates, deletes, enables, and disables alarms. Do you
> still think we don’t need the “alarms” tab? Perhaps this just becomes
> iteration 2 and is left out for now as you mention in your etherpad.
> 
> 5) Question about combined alarms…currently I’ve designed it so that a user
> could create multiple levels in the “Alarm When…” section. They could
> combine these with AND/ORs. Is this going far enough? Or do we actually need
> to allow users to combine Alarms that might watch different resources?
> 
> 6) I updated the Actions column to have the “More” drop down which is
> consistent with other tables in Horizon.
> 
> 7) Added in a section in the “Add Alarm” workflow for “Actions after Alarm”.
> I’m thinking we could have some sort of If State is X, do X type selections,
> but I’m looking to understand more details about how the backend works for
> this feature. Eoghan gave examples of logging and potentially scaling out
> via Heat. Would simple drop downs support these events?
> 
> 8) I can definitely add in a “scheduling” feature with respect to Alarms. I
> haven’t added it in yet, but I could see this being very useful in future
> revisions of this feature.
> 
> 9) Another though is that we could add in some padding for outlier data as
> Eoghan mentioned. Perhaps a setting for “This has happened 3 times over the
> last minute, so now send an alarm.”?
> 
> A new round of feedback is of course welcome :)
> 
> Best,
> Liz
> 
> On Jun 4, 2014, at 1:27 PM, Liz Blanchard <lsurette at redhat.com> wrote:
> 
> > Thanks for the excellent feedback on these, guys! I’ll be working on making
> > updates over the next week and will send a fresh link out when done.
> > Anyone else with feedback, please feel free to fire away.
> > 
> > Best,
> > Liz
> > On Jun 4, 2014, at 12:33 PM, Eoghan Glynn <eglynn at redhat.com> wrote:
> > 
> >> 
> >> Hi Liz,
> >> 
> >> Two further thoughts occurred to me after hitting send on
> >> my previous mail.
> >> 
> >> First, is the concept of alarm dimensioning; see my RDO Ceilometer
> >> getting started guide[1] for an explanation of that notion.
> >> 
> >> "A key associated concept is the notion of dimensioning which defines the
> >> set of matching meters that feed into an alarm evaluation. Recall that
> >> meters are per-resource-instance, so in the simplest case an alarm might
> >> be defined over a particular meter applied to all resources visible to a
> >> particular user. More useful however would the option to explicitly
> >> select which specific resources we're interested in alarming on. On one
> >> extreme we would have narrowly dimensioned alarms where this selection
> >> would have only a single target (identified by resource ID). On the other
> >> extreme, we'd have widely dimensioned alarms where this selection
> >> identifies many resources over which the statistic is aggregated, for
> >> example all instances booted from a particular image or all instances
> >> with matching user metadata (the latter is how Heat identifies
> >> autoscaling groups)."
> >> 
> >> We'd have to think about how that concept is captured in the
> >> UX for alarm creation/update.
> >> 
> >> Second, there are a couple of more advanced alarming features
> >> that were added in Icehouse:
> >> 
> >> 1. The ability to constrain alarms on time ranges, such that they
> >>  would only fire say during 9-to-5 on a weekday. This would
> >>  allow for example different autoscaling policies to be applied
> >>  out-of-hours, when resource usage is likely to be cheaper and
> >>  manual remediation less straight-forward.
> >> 
> >> 2. The ability to exclude low-quality datapoints with anomolously
> >>  low sample counts. This allows the leading edge of the trend of
> >>  widely dimensioned alarms not to be skewed by eagerly-reporting
> >>  outliers.
> >> 
> >> Perhaps not in a first iteration, but at some point it may make sense
> >> to expose these more advanced features in the UI.
> >> 
> >> Cheers,
> >> Eoghan
> >> 
> >> [1] http://openstack.redhat.com/CeilometerQuickStart
> >> 
> >> 
> >> 
> >> ----- Original Message -----
> >>> 
> >>> Hi Liz,
> >>> 
> >>> Looks great!
> >>> 
> >>> Some thoughts on the wireframe doc:
> >>> 
> >>> * The description of form:
> >>> 
> >>>   "If CPU Utilization exceeds 80%, send alarm."
> >>> 
> >>> misses the time-window aspect of the alarm definition.
> >>> 
> >>> Whereas the boilerplate default descriptions generated by
> >>> ceilometer itself:
> >>> 
> >>>   "cpu_util > 70.0 during 3 x 600s"
> >>> 
> >>> captures this important info.
> >>> 
> >>> * The metric names, e.g. "CPU Utilization", are not an exact
> >>> match for the meter names used by ceilometer, e.g. "cpu_util".
> >>> 
> >>> * Non-admin users can create alarms in ceilometer:
> >>> 
> >>> "This is where admins can come in and
> >>>  define and edit any alarms they want
> >>>  the environment to use."
> >>> 
> >>> (though these alarms will only have visibility onto the stats
> >>>  that would be accessible to the user on behalf of whom the
> >>>  alarm is being evaluated)
> >>> 
> >>> * There's no concept currently of alarm severity.
> >>> 
> >>> * "Should users be able to enable/dis-able alarms."
> >>> 
> >>> Yes, the API allows for disabled (i.e. non-evaluated) alarms.
> >>> 
> >>> * "Should users be able to own/assign alarms?"
> >>> 
> >>> Only admin users can create an alarm on behalf of another
> >>> user/tenant.
> >>> 
> >>> * "Should users be able to acknowledge, close alarms?"
> >>> 
> >>> No, we have no concept of ACKing an alarm.
> >>> 
> >>> * "Admins can also see a full list of all Alarms that have
> >>>  taken place in the past."
> >>> 
> >>> In ceilometer terminology, we refer to this as alarm history
> >>> or alarm change events.
> >>> 
> >>> * "CPU Utilization exceeded 80%."
> >>> 
> >>> Again good to capture the duration in that description of the
> >>> event.
> >>> 
> >>> * "Within the Overview section, there should be a new tab that allows the
> >>>  user to click and view all Alarms that have occurred in their
> >>>  environment."
> >>> 
> >>> Not sure really what "environment" means here. Non-admin tenants only
> >>> have visibility to their own alarm, whereas admins have visibility to
> >>> all alarms.
> >>> 
> >>> * "This list would keep the latest  alarms."
> >>> 
> >>> Presumably this would be based on querying the alarm-history API,
> >>> as opposed to an assumption that Horizon is consuming the actual
> >>> alarm notifications?
> >>> 
> >>> Cheers,
> >>> Eoghan
> >>> 
> >>> ----- Original Message -----
> >>>> Hi All,
> >>>> 
> >>>> I’ve recently put together a set of wireframes[1] around Alarm
> >>>> Management
> >>>> that would support the following blueprint:
> >>>> https://blueprints.launchpad.net/horizon/+spec/ceilometer-alarm-management-page
> >>>> 
> >>>> If you have a chance it would be great to hear any feedback that folks
> >>>> have
> >>>> on this direction moving forward with Alarms.
> >>>> 
> >>>> Best,
> >>>> Liz
> >>>> 
> >>>> [1]
> >>>> http://people.redhat.com/~lsurette/OpenStack/Alarm%20Management%20-%202014-05-30.pdf
> >>>> 
> >>>> _______________________________________________
> >>>> OpenStack-dev mailing list
> >>>> OpenStack-dev at lists.openstack.org
> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>> 
> >>> 
> >>> _______________________________________________
> >>> OpenStack-dev mailing list
> >>> OpenStack-dev at lists.openstack.org
> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>> 
> > 
> 
>



More information about the OpenStack-dev mailing list