[openstack-dev] [kolla] Monitoring tooling
Mathias Ewald
mewald at evoila.de
Sun Jul 24 19:32:14 UTC 2016
Puhh, that is is going to be endless :D
I want to see:
Operating System / Hardware: system load, cpu user, cpu system, memory
usage, swap, swap activity, user/free disk space, nic bw, nic packets, disk
bw, disk iops, zombie procs, hdd smart, RAID status
Docker: cpu / memory per container, volume usage
OpenStack: # free FIPs, # instances, # volumes, # networks, # routers, #
networks
HAproxy: # sessions per listener, http status codes per service
RabbitMQ: message rates, # messages in queue
Ceph: placement group states, available capacity, pool bw / iops, disk
latency, journal latency
I want alerts for:
- node goes down
- ceph: mon goes down, osd goes down, pgs stuck for more than X seconds
- service goes down
- container up/down
- mariadb/galera: cluster health
- rabbitmq: cluster health
That's just from the top of my head :) The reason why I don't want alerts
on everything is that most solutions work with static thresholds which is
mostly useless. I prefer walking through my dashboards every morning
checking stuff myself.
cheers,
Mathias
Am 24.07.2016 17:16 schrieb "Michał Jastrzębski" <inc007 at gmail.com>:
> Guys, thanks for all that!
>
> Can we for a second abstract this discussion from technology and start
> by lining up scenerios we want to achieve. Then put a software that
> will allow us to achieve all/most of scenerios with least amount of
> work/maintenance?
>
> So my scenerios:
>
> I want to see when health of docker service
> I want to see when message queue becomes saturated
> I want to see when RAM exceeds 70%
> I want to see when my network causes tons of retransmissions
> I want to see when one of nodes is down
>
> Did I miss anything? Which software stack would allow me to see these
> things?
>
> Cheers,
> Michal
>
> On 24 July 2016 at 09:09, Mathias Ewald <mewald at evoila.de> wrote:
> > I think Sensu is the best monitoring approach out there atm. Nagios /
> Icinga
> > are way to static and scale badly imho. The kind of checks you proposed
> are
> > quite interesting. I would suggest to wrap a sensu check around Tempest
> but
> > that's going to far for the first cycle.
> >
> > The two stacks (Sensu + Unchiwa and TICK) only really overlap in metrics
> > collection which can be done via Sensu and Telegraf. I don't know if it
> > makes sense to have both ... I definitely think we need Sensu though
> simply
> > to monitor service availability and other thresholds and events which
> aren't
> > covered in TICK as not everything is time series data and to have the
> > alerting. Only with Sensu we don't have insight into performance and
> trends,
> > with TICK only we lack alerting on events and non-performance metric data
> > (Is Keystone up? etc)
> >
> > I think it won't hurt to develop theses two stacks in parallel and maybe
> > we'll join them together in a chain as I described earlier.
> >
> > 2016-07-24 14:25 GMT+02:00 Dave Walker <email at daviey.com>:
> >>
> >> Thanks Mathias,
> >>
> >> I'm not tied to Sensu.. anything can really fill that gap in my mind.
> >> You've done a good job at outlining the steps involved. I created a
> >> blueprint with the steps I had in mind[0]
> >>
> >> For this cycle, I wanted to keep it simple so it was easily
> achievable. I
> >> only planned to have some basic up/down for each node and throw the
> >> performance data on the floor.
> >>
> >> I wanted to include the option to include local configs, as json blobs.
> >> Some of the things I was thinking as local config:
> >> - daily checkouts, can instances be built with networking
> >> - remaining resources count (ie, does each subnet have X remaining ip
> >> addresses available)
> >> - Is Ceph healthy?
> >>
> >> So, these things aren't really performance over time interesting.. which
> >> means the intention does differ. However, I do agree that both stacks
> could
> >> achieve both objectives.
> >>
> >> I've essentially got much of this working locally, but would require
> about
> >> a day of cleaning up for submission... but if your work can achieve the
> >> objectives above, i'm happy to discontinue... or help make your stack
> >> pluggable.
> >>
> >> [0] https://blueprints.launchpad.net/kolla/+spec/sensu
> >>
> >> --
> >> Kind Regards,
> >> Dave Walker
> >>
> >> On 24 July 2016 at 11:56, Mathias Ewald <mewald at evoila.de> wrote:
> >>>
> >>> Monitoring is a difficult topic as the number of options regarding the
> >>> toolset and mechanisms are very high. We had some chats about it in
> IRC that
> >>> discovered even more options than I thought existed :D I believe
> Dave's view
> >>> on Sensu is generally correct in that Sensu is more directed to
> monitoring
> >>> in the form of "if X running/working" but of course has the ability to
> >>> transport metrics, too, but lacks the good dashboarding capabilities
> for
> >>> performance data. One set up I could images is
> >>>
> >>> 1. Sensu Client to collect checks and metrics
> >>> 2. RabbitMQ for transport
> >>> 3. Sensu Server to receive, evaluate, alarm and write metrics to
> InfluxDB
> >>> 4. Uchiwa as a Dashboard to Sensu
> >>> 5. InfluxDB to store metrics
> >>> 6. Grafana to dashboard metrics
> >>>
> >>> So Sensu could be used as a replacement for (or in addition to) a
> metrics
> >>> collection daemon like Collectd or what I decided to use: Telegraf.
> For my
> >>> implementation, this means I will add a parameter to make Telegraf
> optional.
> >>> This way, someone else may implement the rest of the stack and the
> user can
> >>> decide which one to use.
> >>>
> >>> What do you think?
> >>>
> >>> Mathias
> >>>
> >>>
> >>>
> >>> 2016-07-23 21:51 GMT+02:00 Stephen Hindle <shindle at llnw.com>:
> >>>>
> >>>> My understanding was Sensu could produce metrics ?
> >>>> And Kapacitor can do alerting for the TICK stack stuff mewald is
> >>>> doing...
> >>>> I really don't see them as that different ?
> >>>>
> >>>>
> >>>> On Fri, Jul 22, 2016 at 5:19 PM, Dave Walker <email at daviey.com>
> wrote:
> >>>> > Yes, this is my thought.
> >>>> >
> >>>> > The scope of the Sensu work is: "Is this thing working?" (with the
> >>>> > reference
> >>>> > being up/down)
> >>>> > But the scope of the Grafana and friends is, "How hard is this
> >>>> > working?"
> >>>> > (but no alerting)
> >>>> >
> >>>> > They are certainly complementary.... However, Sensu can throw data
> at
> >>>> > a
> >>>> > Grafana stack (aiui).. but I fear that is too much to achieve this
> >>>> > cycle.
> >>>> >
> >>>> > --
> >>>> > Kind Regards,
> >>>> > Dave Walker
> >>>> >
> >>>> > On 23 July 2016 at 00:11, Fox, Kevin M <Kevin.Fox at pnnl.gov> wrote:
> >>>> >>
> >>>> >> I think those are two different, complementary things.
> >>>> >>
> >>>> >> One's metrics and the other is monitoring. You probably want both
> at
> >>>> >> the
> >>>> >> same time.
> >>>> >>
> >>>> >> Thanks,
> >>>> >> Kevin
> >>>> >> ________________________________________
> >>>> >> From: Steven Dake (stdake) [stdake at cisco.com]
> >>>> >> Sent: Friday, July 22, 2016 3:52 PM
> >>>> >> To: OpenStack Development Mailing List (not for usage questions)
> >>>> >> Subject: Re: [openstack-dev] [kolla] Monitoring tooling
> >>>> >>
> >>>> >> Thanks for pointing that out. Brain out to lunch today it appears
> :(
> >>>> >>
> >>>> >> I think choices are a good thing even though they increase our
> >>>> >> implementation footprint. Anyone opposed to implementing both with
> >>>> >> something in globals.yml like
> >>>> >> monitoring: grafana or
> >>>> >> monitoring: sensu
> >>>> >>
> >>>> >> Comments questions or concerns welcome.
> >>>> >>
> >>>> >> Regards
> >>>> >> -steve
> >>>> >>
> >>>> >> On 7/22/16, 3:42 PM, "Stephen Hindle" <shindle at llnw.com> wrote:
> >>>> >>
> >>>> >> >Don't forget mewalds implementation as well - we now have 2
> >>>> >> > monitoring
> >>>> >> >options for kolla :-)
> >>>> >> >
> >>>> >> >On Fri, Jul 22, 2016 at 3:15 PM, Steven Dake (stdake)
> >>>> >> > <stdake at cisco.com>
> >>>> >> >wrote:
> >>>> >> >> Hi folks,
> >>>> >> >>
> >>>> >> >> At the midcycle we decided to push off implementing Monitoring
> >>>> >> >> until
> >>>> >> >>post
> >>>> >> >> Newton. The rationale for this decision was that the core
> review
> >>>> >> >> team
> >>>> >> >>has
> >>>> >> >> enough on their plates and nobody was super keen to implement
> any
> >>>> >> >>monitoring
> >>>> >> >> solution given our other priorities.
> >>>> >> >>
> >>>> >> >> Like all good things, communities produce new folks that want to
> >>>> >> >> do new
> >>>> >> >> things, and Sensu was proposed as Kolla's monitoring solution
> >>>> >> >> (atleast
> >>>> >> >>the
> >>>> >> >> first one). A developer that has done some good work has shown
> up
> >>>> >> >> to
> >>>> >> >>do the
> >>>> >> >> job as well :) I have heard good things about Sensu, minus the
> >>>> >> >> fact
> >>>> >> >>that it
> >>>> >> >> is implemented in Ruby and I fear it may end up causing our
> gate a
> >>>> >> >> lot
> >>>> >> >>of
> >>>> >> >> hassle.
> >>>> >> >>
> >>>> >> >> https://review.openstack.org/#/c/341861/
> >>>> >> >>
> >>>> >> >>
> >>>> >> >> Anyway I think we can work through the gate problem.
> >>>> >> >>
> >>>> >> >> Does anyone have any better suggestion? I'd like to unblock
> >>>> >> >> Dave's
> >>>> >> >> work
> >>>> >> >> which is blocked on a 2 pending a complete discussion of our
> >>>> >> >> monitoring
> >>>> >> >> solution. Note we may end up implementing more than one down
> the
> >>>> >> >> road
> >>>> >> >>
> >>>> >> >> Sensu is just where the original interest was.
> >>>> >> >>
> >>>> >> >> Please provide feedback, even if you don't have a preference,
> >>>> >> >> whether
> >>>> >> >>your a
> >>>> >> >> core reviewer or not.
> >>>> >> >>
> >>>> >> >> My take is we can merge this work in non-prioirty order, and if
> it
> >>>> >> >>makes the
> >>>> >> >> end of the cycle fantastic if not, we can release it in
> Ocatta.
> >>>> >> >>
> >>>> >> >> Regards
> >>>> >> >> -steve
> >>>> >> >>
> >>>> >> >>
> >>>> >> >>
> >>>> >>
> >>>> >> >>
> >>>> >> >> >>
> >>_________________________________________________________________________
> >>>> >> >>_
> >>>> >> >> OpenStack Development Mailing List (not for usage questions)
> >>>> >> >> Unsubscribe:
> >>>> >> >>OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >>>> >> >>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>> >> >>
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> >--
> >>>> >> >Stephen Hindle - Senior Systems Engineer
> >>>> >> >480.807.8189 480.807.8189
> >>>> >> >www.limelight.com Delivering Faster Better
> >>>> >> >
> >>>> >> >Join the conversation
> >>>> >> >
> >>>> >> >at Limelight Connect
> >>>> >> >
> >>>> >> >--
> >>>> >> >The information in this message may be confidential. It is
> intended
> >>>> >> >solely
> >>>> >> >for
> >>>> >> >the addressee(s). If you are not the intended recipient, any
> >>>> >> > disclosure,
> >>>> >> >copying or distribution of the message, or any action or omission
> >>>> >> > taken
> >>>> >> >by
> >>>> >> >you
> >>>> >> >in reliance on it, is prohibited and may be unlawful. Please
> >>>> >> > immediately
> >>>> >> >contact the sender if you have received this message in error.
> >>>> >> >
> >>>> >> >
> >>>> >>
> >>>> >> >
> >>>> >> > >
> >__________________________________________________________________________
> >>>> >> >OpenStack Development Mailing List (not for usage questions)
> >>>> >> >Unsubscribe:
> >>>> >> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >>>> >> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> __________________________________________________________________________
> >>>> >> OpenStack Development Mailing List (not for usage questions)
> >>>> >> Unsubscribe:
> >>>> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >>>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>> >>
> >>>> >>
> >>>> >>
> __________________________________________________________________________
> >>>> >> OpenStack Development Mailing List (not for usage questions)
> >>>> >> Unsubscribe:
> >>>> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >>>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> __________________________________________________________________________
> >>>> > OpenStack Development Mailing List (not for usage questions)
> >>>> > Unsubscribe:
> >>>> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>> >
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Stephen Hindle - Senior Systems Engineer
> >>>> 480.807.8189 480.807.8189
> >>>> www.limelight.com Delivering Faster Better
> >>>>
> >>>> Join the conversation
> >>>>
> >>>> at Limelight Connect
> >>>>
> >>>> --
> >>>> The information in this message may be confidential. It is intended
> >>>> solely
> >>>> for
> >>>> the addressee(s). If you are not the intended recipient, any
> >>>> disclosure,
> >>>> copying or distribution of the message, or any action or omission
> taken
> >>>> by
> >>>> you
> >>>> in reliance on it, is prohibited and may be unlawful. Please
> >>>> immediately
> >>>> contact the sender if you have received this message in error.
> >>>>
> >>>>
> >>>>
> >>>>
> __________________________________________________________________________
> >>>> OpenStack Development Mailing List (not for usage questions)
> >>>> Unsubscribe:
> >>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Mobil: +49 176 10567592
> >>> E-Mail: mewald at evoila.de
> >>>
> >>> evoila GmbH
> >>> Wilhelm-Theodor-Römheld-Str. 34
> >>> 55130 Mainz
> >>> Germany
> >>>
> >>> Geschäftsführer: Johannes Hiemer
> >>>
> >>> Amtsgericht Mainz HRB 42719
> >>>
> >>> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> >>> Informationen. Wenn Sie nicht der richtige Adressat sind oder diese
> E-Mail
> >>> irrtümlich erhalten haben, informieren Sie bitte sofort den Absender
> und
> >>> vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
> >>> Weitergabe dieser Mail ist nicht gestattet.
> >>>
> >>> This e-mail may contain confidential and/or privileged information. If
> >>> You are not the intended recipient (or have received this e-mail in
> error)
> >>> please notify the sender immediately and destroy this e-mail. Any
> >>> unauthorised copying, disclosure or distribution of the material in
> this
> >>> e-mail is strictly forbidden.
> >>>
> >>>
> >>>
> __________________________________________________________________________
> >>> OpenStack Development Mailing List (not for usage questions)
> >>> Unsubscribe:
> >>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>
> >>
> >>
> >>
> __________________________________________________________________________
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >
> >
> >
> > --
> > Mobil: +49 176 10567592
> > E-Mail: mewald at evoila.de
> >
> > evoila GmbH
> > Wilhelm-Theodor-Römheld-Str. 34
> > 55130 Mainz
> > Germany
> >
> > Geschäftsführer: Johannes Hiemer
> >
> > Amtsgericht Mainz HRB 42719
> >
> > Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> > Informationen. Wenn Sie nicht der richtige Adressat sind oder diese
> E-Mail
> > irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
> > vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
> > Weitergabe dieser Mail ist nicht gestattet.
> >
> > This e-mail may contain confidential and/or privileged information. If
> You
> > are not the intended recipient (or have received this e-mail in error)
> > please notify the sender immediately and destroy this e-mail. Any
> > unauthorised copying, disclosure or distribution of the material in this
> > e-mail is strictly forbidden.
> >
> >
> __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160724/6dd4317d/attachment.html>
More information about the OpenStack-dev
mailing list