Open Stack

Wed Apr 8 13:53:14 UTC 2015

>From: Ryan Brown <rybrown at redhat.com>
>Sent: Wednesday, April 8, 2015 9:42 AM
>
>> The trend in the monitoring space seems to be:
>>
>> 1. Alarms are issued from Metrics as Events.
>>     (events can issue alarms too, but conventional alarming is metric based)
>> 2. Multiple events are analyzed to produce Metrics (stream processing)
>> 3. Go to Step 1
>>
>
>Indeed. I sort of envisioned heat sending out events that are then
>consumed both as metrics and by the user (where appropriate). In
>StackTach I can see that being implemented as
>
>                    /--> resource events ----> other tools
>Heat --> Winchester ---> notifications stream    ------> user
>                    \--> metrics stream --> alerts --/
>

Yep, you can get a lot of great info from a notification. And a lot of
metrics can be produced from them. We use them for debugging, usage/billing
and performance monitoring/tuning. Contextual data ftw! :)

>> Events start as structured data. More so, we're looking at establishing
>> AVRO-based schema definitions on top of these events (slow progress).
>
>Yeah, I'd really like to have a schema for Heat events so we can have a
>single event stream and repackage events for different consumption goals
>(metrics, notifications, programmatic interaction, etc).

Yep, that's the right approach. There are some people at Rax looking at getting
this nailed down soon. 

>> Having to build filters is a relatively error-prone approach compared to the
>> methods described above.
>
>I wasn't saying *we* should do the unstructured message + regex filters
>strategy, I was just pointing out the CW solution for folks who hadn't
>used it.

Gotcha ... agreed.

>>>> [snip]
>>
>> The Fujitsu team have already added logging support to Monasca (with an
>> elasticsearch backend) and HP is currently adding StackTach.v3 support for
>> notification->event conversion as well as our Winchester event stream
>> processing engine. Also, this is based on Kafka vs. RabbitMQ, which has better
>> scaling characteristics for this kind of data.
>
>Oooh, I'll have a look into that, Kafka as an event bus sounds like a
>good fit. I have the same concern Angus voiced earlier about Zaqar
>though. What's the deployment of StackTach.v3 across OpenStack
>installations? Is it mostly deployed for Helion/Rackspace, or are
>smaller deployers using it as well?

We're in the short strokes of rolling STv3 into production at Rax now. No issues
with the libraries, it's all hiccups with downstream system integration. HP have
some good requirements they want added around hosted monitoring. People are 
still installing and playing around with STv2. It's battle proven and solves the
immediate OpenStack concerns. But it's more rigid than STv3. If you want to 
get going today, I'd recommend STv2, but all new efforts and partner work is
going into STv3. 

>>> This could be extended to richer JSON events that include the stack,
>>> resources affected in the update, stats like "num-deleted-resources" or
>>> "num-replaced-resources", autoscaling actions, and info about stack errors.
>>
>> Some of these sound more like a metrics than notifications. We should be
>> careful not to misuse the two.
>
>I think they're events, and have facets that are quantifiable as metrics
>(num-replaced-resources on an update action) and that should be
>user-visible (update is complete, or autoscaling actions taken).

Yep, tricky to discern sometimes. Perhaps a better way to decide if it's an
event or a metric is to consider the frequency they're generated or how
much context they contain?

>>> Is there a way for users as-is to view those raw notifications, not just
>>> the indexed k/v pairs?
>>
>> In StackTach.v3 we ship the raw notifications to HDFS for archiving, but
>> expose the reduced event via the API. The message-id links the two.
>>
>> Lots more here: http://www.stacktach.com
>
>Thanks! I'll have to read up.

By all means, reach out if you have questions. The more people we have that see
the value in events, the better. Looking at the rise of packages like storm, 
spark, reimann.io, etc. it's clear it's a big change in the distributed computing
monitoring space. 

-S

Open Stack

[openstack-dev] [all] how to send messages (and events) to our users

OpenStack

Community

Documentation

Branding & Legal