[openstack-dev] [Horizon] [UX] Design for Alarming and Alarm Management

Liz Blanchard lsurette at redhat.com
Mon Jun 16 18:31:11 UTC 2014


On Jun 10, 2014, at 4:07 PM, Martinez, Christian <christian.martinez at intel.com> wrote:

> Here my feedback regarding the designs:

Thanks again for the excellent feedback. I’ve created a new version of wireframes based on these comments (along with the discussion on mapping of Alarms to Resources.) They can be viewed here:
http://people.redhat.com/~lsurette/OpenStack/Alarm%20Management%20-%202014-06-13.pdf

I’ve also included some responses/questions to your comments below inline…

Best,
Liz

>  
> Page 2:
> ·         I think that the admin would probably want to filter alarms per user, project, name, meter_name, current_alarm_state(“ok”=”alarm ready”; “insufficient data” = “alarm not ready”; “alarm” =”alarm triggered”), but we don’t have all that columns on the table. Maybe it will be better just to add columns for those fields, or have another tables or tabs that could allow the admin to see the alarms based on that parameters.

I added in User and Project to the table.

I’m still trying to work it out in my head if it would make sense to users to have a field in the Alarm Management table for “Current State”. Wouldn’t they just look at the Alarm History? What if an Alarm was triggered a while ago and the user doesn’t care about this anymore, but it is still shown in the table as “Triggered”. The user doesn’t get information like how many times in the past day has this alarm been triggered unless they look at the alarm history. Also, I’m not sure what the user would get from being able to see other states like Alarm Ready. Would we even allow an Alarm with “insufficient data” to be saved? Do you have an example of this? Sorry for all of the questions here, just trying to make sure it will be understandable for all users :) For now, I’ve added an icon denoting whether or not the alarm has been triggered in the last 24 hours (so in the wireframe this would be Alarms 1 and 2).


> ·         I would add a “delete alarm” button as a table action

I’d expect this to be in the drop down of bulk actions. I’ve added a page to the new design showing this list of actions.


> ·         Nice to have: if we are thinking about “combining alarms”, maybe having a “combine alarm” button as table action that gets activated when the admin selects two or more alarms.
> o   When the button is clicked, it should show something like the “Add Alarm” dialog, allowing the user to create a new combined alarm, based on their previous alarm selection

I like this idea although it sounds pretty advanced for this first round. I think I’d want to be sure that people would be looking to actually do this and combine alarms this way. Wouldn’t they just create a new alarm and add multiple "Alarm When" criteria? Would they think to combine two alarms they already have? I’ve added this to the latest wireframes too just to get a feel for what it might look like. One question: Should combining Alarms default the criteria to be AND or OR?

>  
> Page 3-5:
> ·         Love the workflow!
> ·         A couple of things related to the “Alarm When” setup:
> o   Depending on the resource that is “selected” (from page 2) you would have a list of the possible meters to be considered. For example, if your resource is an instance, you would have the following list of meters: number of instances, cpu time used, Average CPU utilization, memory, etc. This will also affect the “threshold” unit to be used. In the design, there is a textbox that has a percentage label (“%”) right next to it. The thing is that this “threshold” could be a percentage (for example, CPU utilization), but it could be a flat number as well (for example, number of instances on the project).

Absolutely. I’ve updated the description text to the right of this wireframe to try to make it clear that this is just an example and depending on what resource is chosen, the list of Meters will be populated and then depending on which meter is chosen, the format of the textbox/units would change.

> o   (Related to your point 5) There are two things related to combined alarms that we need to consider.
> §  1) the combination can be between any type of alarm: you could combine alarms associated to different resources, meters, users? (Ceilometer expert will know). You even could combine combined alarms with other alarms as well. The AND and OR operation between the alarms can be used for combined alarms. For instance, combine two alarms with an OR operator
> §  2) Adding two rules to match to a single alarm is not supported by Ceilometer. For that, you use combined alarms J. The idea of adding triggering rules to the alarm creation dialog is great for me, but I’m not sure if Ceilometer supports that.

Hmm yeah, I’m hoping no matter how it’s done in Ceilometer, we might be able to represent it this way in the UI. I fear that the concept of combining two alarms is just a bit too close to the implementation model and really the mental model of the user is “I want to have an alarm that potentially matches one or more rules.” I don’t think the user would think to create two separate alarms and then combine them...

>  
> Page 6:
> ·         Really liked the way that actions and state could be set, but we should see how the notifications will be handled. Maybe these actions could be set “by default” in our first version and after that, start thinking about setting custom actions for alarm states in the future (same for email add-on  at the user settings)

Sounds like a great plan to me. I’ve added in one more example for “Alarm Triggered” when an e-mail would be sent to the user. I think it’s nice to have this, but allowing these notification settings to be set at a global level (user settings) would be MUCH nicer so that the user doesn’t have to set this for each Alarm they create. 

>  
> Page 7:  “Viewing Alarm History” A.K.A: the alarms that have occurred.
> ·         Same as page 2: I think that the admin would probably want to filter alarms per user, project, name, meter_name, etc. (for instance, to see what alarms have being triggered on the project “X”), but we don’t have that columns on the table. Maybe it will be better just to add columns for those fields, or have another tables or tabs that could allow the admin to see the alarms based on that parameters.

Agreed. Added project and user to these tables.

> ·         Is the alarm date column referring to the date in which the alarm was created or the date in which the alarm was triggered?

This would be when the alarm was triggered. So this table could include multiple of the same alarm, just triggered at different times.

> ·         Is the alarm name content a link or a simple text? What would happen when the admin selects an alarm? Is It going to show the “update alarm dialog”? Are there any actions associated to the rows?

I have this as a link so that the user could jump over to the actual Alert Definition in the management side of things and change up some of the details perhaps. We could change this to be just plain text if this doesn’t seem like a use case the user would want to do. I thought it could be a nice way to view the Alarm Definition, though.

> ·         Maybe changing the name of the tab to “Activated alarms” or smth that actually it’s interpreted as “in here you can see the alarms that have occurred”.

My one concern with this would be…since users can enable/disable alarms on the alarm management page, I would be afraid they’d confuse this with “Enabled Alarms”. Maybe “Triggered Alarms”? That’s fairly technical though in my opinion.

>  
> Hope it helps
>  
> Cheers,
> H
>  
> From: Liz Blanchard [mailto:lsurette at redhat.com] 
> Sent: Monday, June 9, 2014 2:36 PM
> To: Eoghan Glynn
> Cc: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [Horizon] [UX] Design for Alarming and Alarm Management
>  
> Hi all,
>  
> Thanks again for the great comments on the initial cut of wireframes. I’ve updated them a fair amount based on feedback in this e-mail thread along with the feedback written up here:
> https://etherpad.openstack.org/p/alarm-management-page-design-discussion
>  
> Here is a link to the new version:
> http://people.redhat.com/~lsurette/OpenStack/Alarm%20Management%20-%202014-06-05.pdf
>  
> And a quick explanation of the updates that I made from the last version:
>  
> 1) Removed severity.
>  
> 2) Added Status column. I also added details around the fact that users can enable/disable alerts.
>  
> 3) Updated Alarm creation workflow to include choosing the project and user (optionally for filtering the resource list), choosing resource, and allowing for choose of amount of time to monitor for alarming.
>      -Perhaps we could be even more sophisticated for how we let users filter down to find the right resources that they want to monitor for alarms?
>  
> 4) As for notifying users…I’ve updated the “Alarms” section to be “Alarms History”. The point here is to show any Alarms that have occurred to notify the user. Other notification ideas could be to allow users to get notified of alerts via e-mail (perhaps a user setting?). I’ve added a wireframe for this update in User Settings. Then the Alarms Management section would just be where the user creates, deletes, enables, and disables alarms. Do you still think we don’t need the “alarms” tab? Perhaps this just becomes iteration 2 and is left out for now as you mention in your etherpad.
>  
> 5) Question about combined alarms…currently I’ve designed it so that a user could create multiple levels in the “Alarm When…” section. They could combine these with AND/ORs. Is this going far enough? Or do we actually need to allow users to combine Alarms that might watch different resources?
>  
> 6) I updated the Actions column to have the “More” drop down which is consistent with other tables in Horizon.
>  
> 7) Added in a section in the “Add Alarm” workflow for “Actions after Alarm”. I’m thinking we could have some sort of If State is X, do X type selections, but I’m looking to understand more details about how the backend works for this feature. Eoghan gave examples of logging and potentially scaling out via Heat. Would simple drop downs support these events?
>  
> 8) I can definitely add in a “scheduling” feature with respect to Alarms. I haven’t added it in yet, but I could see this being very useful in future revisions of this feature.
>  
> 9) Another though is that we could add in some padding for outlier data as Eoghan mentioned. Perhaps a setting for “This has happened 3 times over the last minute, so now send an alarm.”?  
>  
> A new round of feedback is of course welcome :)
>  
> Best,
> Liz
>  
> On Jun 4, 2014, at 1:27 PM, Liz Blanchard <lsurette at redhat.com> wrote:
> 
> 
> Thanks for the excellent feedback on these, guys! I’ll be working on making updates over the next week and will send a fresh link out when done. Anyone else with feedback, please feel free to fire away.
> 
> Best,
> Liz
> On Jun 4, 2014, at 12:33 PM, Eoghan Glynn <eglynn at redhat.com> wrote:
> 
> 
> 
> Hi Liz,
> 
> Two further thoughts occurred to me after hitting send on
> my previous mail.
> 
> First, is the concept of alarm dimensioning; see my RDO Ceilometer
> getting started guide[1] for an explanation of that notion.
> 
> "A key associated concept is the notion of dimensioning which defines the set of matching meters that feed into an alarm evaluation. Recall that meters are per-resource-instance, so in the simplest case an alarm might be defined over a particular meter applied to all resources visible to a particular user. More useful however would the option to explicitly select which specific resources we're interested in alarming on. On one extreme we would have narrowly dimensioned alarms where this selection would have only a single target (identified by resource ID). On the other extreme, we'd have widely dimensioned alarms where this selection identifies many resources over which the statistic is aggregated, for example all instances booted from a particular image or all instances with matching user metadata (the latter is how Heat identifies autoscaling groups)."
> 
> We'd have to think about how that concept is captured in the
> UX for alarm creation/update.
> 
> Second, there are a couple of more advanced alarming features 
> that were added in Icehouse:
> 
> 1. The ability to constrain alarms on time ranges, such that they
>  would only fire say during 9-to-5 on a weekday. This would
>  allow for example different autoscaling policies to be applied
>  out-of-hours, when resource usage is likely to be cheaper and
>  manual remediation less straight-forward.
> 
> 2. The ability to exclude low-quality datapoints with anomolously
>  low sample counts. This allows the leading edge of the trend of
>  widely dimensioned alarms not to be skewed by eagerly-reporting
>  outliers.
> 
> Perhaps not in a first iteration, but at some point it may make sense
> to expose these more advanced features in the UI.
> 
> Cheers,
> Eoghan
> 
> [1] http://openstack.redhat.com/CeilometerQuickStart
> 
> 
> 
> ----- Original Message -----
> 
> 
> Hi Liz,
> 
> Looks great!
> 
> Some thoughts on the wireframe doc:
> 
> * The description of form:
> 
>   "If CPU Utilization exceeds 80%, send alarm."
> 
> misses the time-window aspect of the alarm definition.
> 
> Whereas the boilerplate default descriptions generated by
> ceilometer itself:
> 
>   "cpu_util > 70.0 during 3 x 600s"
> 
> captures this important info.
> 
> * The metric names, e.g. "CPU Utilization", are not an exact
> match for the meter names used by ceilometer, e.g. "cpu_util".
> 
> * Non-admin users can create alarms in ceilometer:
> 
> "This is where admins can come in and
>  define and edit any alarms they want
>  the environment to use."
> 
> (though these alarms will only have visibility onto the stats
>  that would be accessible to the user on behalf of whom the
>  alarm is being evaluated)
> 
> * There's no concept currently of alarm severity.
> 
> * "Should users be able to enable/dis-able alarms."
> 
> Yes, the API allows for disabled (i.e. non-evaluated) alarms.
> 
> * "Should users be able to own/assign alarms?"
> 
> Only admin users can create an alarm on behalf of another
> user/tenant.
> 
> * "Should users be able to acknowledge, close alarms?"
> 
> No, we have no concept of ACKing an alarm.
> 
> * "Admins can also see a full list of all Alarms that have
>  taken place in the past."
> 
> In ceilometer terminology, we refer to this as alarm history
> or alarm change events.
> 
> * "CPU Utilization exceeded 80%."
> 
> Again good to capture the duration in that description of the
> event.
> 
> * "Within the Overview section, there should be a new tab that allows the
>  user to click and view all Alarms that have occurred in their
>  environment."
> 
> Not sure really what "environment" means here. Non-admin tenants only
> have visibility to their own alarm, whereas admins have visibility to
> all alarms.
> 
> * "This list would keep the latest  alarms."
> 
> Presumably this would be based on querying the alarm-history API,
> as opposed to an assumption that Horizon is consuming the actual
> alarm notifications?
> 
> Cheers,
> Eoghan
> 
> ----- Original Message -----
> 
> Hi All,
> 
> I’ve recently put together a set of wireframes[1] around Alarm Management
> that would support the following blueprint:
> https://blueprints.launchpad.net/horizon/+spec/ceilometer-alarm-management-page
> 
> If you have a chance it would be great to hear any feedback that folks have
> on this direction moving forward with Alarms.
> 
> Best,
> Liz
> 
> [1]
> http://people.redhat.com/~lsurette/OpenStack/Alarm%20Management%20-%202014-05-30.pdf
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
>  
>  
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140616/40d088e5/attachment-0001.html>


More information about the OpenStack-dev mailing list