[Openstack-operators] Uptime and SLA's

Kingshott, Daniel Daniel.Kingshott at bestbuy.com
Fri Jun 3 00:42:04 UTC 2016


Thank you for your responses, I appreciate the insight.

Dan

Daniel Kingshott
Cloud Dude
(425) 623 4359 - Cell
 
Best Buy Co. Inc.
Technology Development Center
1000 Denny Way | 8th Floor | Seattle, WA | 98109 | USA





From:  "Bak, Ryan M" <Ryan.Bak at charter.com>
Date:  Thursday, June 2, 2016 at 4:51 PM
To:  Melvin Hillsman <mrhillsman at gmail.com>, Matt Fischer
<matt at mattfischer.com>, "Kingshott, Daniel" <Daniel.Kingshott at bestbuy.com>
Cc:  OpenStack Operators <openstack-operators at lists.openstack.org>
Subject:  Re: [Openstack-operators] Uptime and SLA's


Melvin,

The Monasca wiki (https://wiki.openstack.org/wiki/Monasca) has a lot of
information on architecture, as well as links to several talks given at
summits over the past couple years.  That’s probably your
 best bet for general information and understanding.  There isn’t any
single install guide that I’m aware of, but you can bring up a working
Monasca stack in Devstack
(https://github.com/openstack/monasca-api/tree/master/devstack)
 and take a look at that, and there is also a puppet module for Monasca
setup available here: https://github.com/openstack/puppet-monasca.  Even
if you’re not using puppet that will give you a sense
 of what you’ll need and how to set everything up.

Monasca can definitely run in an HA configuration.  In production we run
three load balanced nodes in each region with the api stack (monasca-api,
monasca-persister, monasca-thresh, kafka and storm).  We also have a
separate cluster of nodes in each region
 running Vertica for our backend, and we use Kafka to replicate the data
across regions so that the data is global.  If you want a diagram of our
setup, you can find one in our talk from the Austin summit
(https://www.youtube.com/watch?v=uBapdsOpND4&feature=youtu.be&t=17m53s).
  Feel free to reach out here or on the Monasca IRC channel if you run
into any difficulties setting up Monasca, or have any other questions.

-Ryan Bak

From: Melvin Hillsman <mrhillsman at gmail.com>
Date: Thursday, June 2, 2016 at 4:58 PM
To: Matt Fischer <matt at mattfischer.com>, "Kingshott, Daniel"
<Daniel.Kingshott at bestbuy.com>
Cc: OpenStack Operators <openstack-operators at lists.openstack.org>
Subject: Re: [Openstack-operators] Uptime and SLA's
Resent-From: Ryan Bak <ryan.bak at twcable.com>


Hey Matt,

I am looking into Monasca and would like to know your recommendation for
resources regarding a) understanding and b) installing the project;
especially since there is no install guide on the project wiki.
Additionally, can you shed some light on whether
 this setup would run behind a loadbalancer in an HA configuration; I am
looking at using three servers which will house a “stack”/“toolchain” for
such activities.

Kind regards,
--
Melvin HillsmanOps Technical Lead
OpenStack Innovation Centermrhillsman at gmail.com
phone: (210) 312-1267
mobile: (210) 413-1659
Learner | Ideation | Belief | Responsibility | Commandhttp://osic.org






From: Matt Fischer
Date: Thursday, June 2, 2016 at 5:29 PM
To: "Kingshott, Daniel"
Cc: OpenStack Operators
Subject: Re: [Openstack-operators] Uptime and SLA's


We do this a few different ways, some of which may meet your needs.

For API calls we measure a simple, quick, and impactless call for each
service (like heat stack-list) and we monitor East from West and vice
versa. The goal here is nothing added to the DBs, so nothing like neutron
net-create. The downside here is that
 some of these calls work even when the service isn't 100% healthy so keep
that in mind.

Then we also have a set of "what would a user do" calls like "spin up a VM
and attach a FIP and ssh in" or "create and delete a volume". These run
less often. 

Finally we have a reference cloud application that uses our LBaaS, GSLB,
HA routers, and multiple front-end/back-end nodes. This has the highest
expectation of uptime and is used as an example for our customers of how
you can run an app with "more nines"
 than the underlying infra.

On any of these, especially the first two I mentioned, time series data is
super useful. It's good to know that your create volume times (for
example) are 40% slower after your deploy. We use Monasca and Grafana for
that.



On Thu, Jun 2, 2016 at 2:37 PM, Kingshott, Daniel
<Daniel.Kingshott at bestbuy.com> wrote:

We¹re currently in the process of writing up an internal SLA for our
openstack cloud, I¹d be interested to hear what others have done and what
metrics folks are capturing.

My initial thoughts are success / fail spawning instances, creating and
attaching volumes, API availability and so on.

Can anyone on the list share their insights?

Thanks,

Dan


Daniel Kingshott
Cloud Dude
(425) 623 4359 <tel:%28425%29%20623%204359> - Cell

Best Buy Co. Inc.
Technology Development Center
1000 Denny Way | 8th Floor | Seattle, WA | 98109 | USA


_______________________________________________
OpenStack-operators mailing list
OpenStack-operators at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators





_______________________________________________ OpenStack-operators
mailing list 
OpenStack-operators at lists.openstack.org
<mailto:OpenStack-operators at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators> 



More information about the OpenStack-operators mailing list