[openstack-dev] [vitrage] [nova] [HA] [masakari] VM Heartbeat / Healthcheck Monitoring

Vikash Kumar Vikash.Kumar at oneconvergence.com
Tue May 30 13:46:24 UTC 2017


Thanks Sam , Will sure review it.

On Tue, 30 May 2017, 17:59 Sam P, <sam47priya at gmail.com> wrote:

> Hi Vikash,
>
>   Greg submit the spec [1] for intrusive instance monitoring.
>   Your review will be highly appreciated..
>  [1] https://review.openstack.org/#/c/469070/
> --- Regards,
> Sampath
>
>
>
> On Sat, May 20, 2017 at 4:49 PM, Vikash Kumar
> <Vikash.Kumar at oneconvergence.com> wrote:
> > Thanks Sam
> >
> >
> > On Sat, 20 May 2017, 06:51 Sam P, <sam47priya at gmail.com> wrote:
> >>
> >> Hi Vikash,
> >>  Great... I will add you as reviewer to this spec.
> >>  Thank you..
> >> --- Regards,
> >> Sampath
> >>
> >>
> >>
> >> On Fri, May 19, 2017 at 1:06 PM, Vikash Kumar
> >> <vikash.kumar at oneconvergence.com> wrote:
> >> > Hi Greg,
> >> >
> >> >     Please include my email in this spec also. We are also dealing
> with
> >> > HA
> >> > of Virtual Instances (especially for Vendors) and will participate.
> >> >
> >> > On Thu, May 18, 2017 at 11:33 PM, Waines, Greg
> >> > <Greg.Waines at windriver.com>
> >> > wrote:
> >> >>
> >> >> Yes I am good with writing spec for this in masakari-spec.
> >> >>
> >> >>
> >> >>
> >> >> Do you use gerrit for this git ?
> >> >>
> >> >> Do you have a template for your specs ?
> >> >>
> >> >>
> >> >>
> >> >> Greg.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> From: Sam P <sam47priya at gmail.com>
> >> >> Reply-To: "openstack-dev at lists.openstack.org"
> >> >> <openstack-dev at lists.openstack.org>
> >> >> Date: Thursday, May 18, 2017 at 1:51 PM
> >> >> To: "openstack-dev at lists.openstack.org"
> >> >> <openstack-dev at lists.openstack.org>
> >> >> Subject: Re: [openstack-dev] [vitrage] [nova] [HA] [masakari] VM
> >> >> Heartbeat
> >> >> / Healthcheck Monitoring
> >> >>
> >> >>
> >> >>
> >> >> Hi Greg,
> >> >>
> >> >> Thank you Adam for followup.
> >> >>
> >> >> This is new feature for masakari-monitors and think  Masakari can
> >> >>
> >> >> accommodate this feature in  masakari-monitors.
> >> >>
> >> >> From the implementation prospective, it is not that hard to do.
> >> >>
> >> >> However, as you can see in our Boston presentation, Masakari will
> >> >>
> >> >> replace its monitoring parts ( which is masakari-monitors) with,
> >> >>
> >> >> nova-host-alerter, **-process-alerter, and **-instance-alerter. (**
> >> >>
> >> >> part is not defined yet..:p)...
> >> >>
> >> >> Therefore, I would like to save this specifications, and make sure we
> >> >>
> >> >> will not miss  anything in the transformation..
> >> >>
> >> >> Does is make sense to write simple spec for this in masakari-spec
> [1]?
> >> >>
> >> >> So we can discuss about the requirements how to implement it.
> >> >>
> >> >>
> >> >>
> >> >> [1] https://github.com/openstack/masakari-specs
> >> >>
> >> >>
> >> >>
> >> >> --- Regards,
> >> >>
> >> >> Sampath
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Thu, May 18, 2017 at 2:29 AM, Adam Spiers <aspiers at suse.com>
> wrote:
> >> >>
> >> >> I don't see any reason why masakari couldn't handle that, but you'd
> >> >>
> >> >> have to ask Sampath and the masakari team whether they would consider
> >> >>
> >> >> that in scope for their roadmap.
> >> >>
> >> >>
> >> >>
> >> >> Waines, Greg <Greg.Waines at windriver.com> wrote:
> >> >>
> >> >>
> >> >>
> >> >> Sure.  I can propose a new user story.
> >> >>
> >> >>
> >> >>
> >> >> And then are you thinking of including this user story in the scope
> of
> >> >>
> >> >> what masakari would be looking at ?
> >> >>
> >> >>
> >> >>
> >> >> Greg.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> From: Adam Spiers <aspiers at suse.com>
> >> >>
> >> >> Reply-To: "openstack-dev at lists.openstack.org"
> >> >>
> >> >> <openstack-dev at lists.openstack.org>
> >> >>
> >> >> Date: Wednesday, May 17, 2017 at 10:08 AM
> >> >>
> >> >> To: "openstack-dev at lists.openstack.org"
> >> >>
> >> >> <openstack-dev at lists.openstack.org>
> >> >>
> >> >> Subject: Re: [openstack-dev] [vitrage] [nova] [HA] VM Heartbeat /
> >> >>
> >> >> Healthcheck Monitoring
> >> >>
> >> >>
> >> >>
> >> >> Thanks for the clarification Greg.  This sounds like it has the
> >> >>
> >> >> potential to be a very useful capability.  May I suggest that you
> >> >>
> >> >> propose a new user story for it, along similar lines to this existing
> >> >>
> >> >> one?
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> http://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/ha_vm.html
> >> >>
> >> >>
> >> >>
> >> >> Waines, Greg
> >> >> <Greg.Waines at windriver.com<mailto:Greg.Waines at windriver.com>>
> >> >>
> >> >> wrote:
> >> >>
> >> >> Yes that’s correct.
> >> >>
> >> >> VM Heartbeating / Health-check Monitoring would introduce intrusive /
> >> >>
> >> >> white-box type monitoring of VMs / Instances.
> >> >>
> >> >>
> >> >>
> >> >> I realize this is somewhat in the gray-zone of what a cloud should be
> >> >>
> >> >> monitoring or not,
> >> >>
> >> >> but I believe it provides an alternative for Applications deployed in
> >> >> VMs
> >> >>
> >> >> that do not have an external monitoring/management entity like a VNF
> >> >> Manager
> >> >>
> >> >> in the MANO architecture.
> >> >>
> >> >> And even for VMs with VNF Managers, it provides a highly reliable
> >> >>
> >> >> alternate monitoring path that does not rely on Tenant Networking.
> >> >>
> >> >>
> >> >>
> >> >> You’re correct, that VM HB/HC Monitoring would leverage
> >> >>
> >> >> https://wiki.libvirt.org/page/Qemu_guest_agent
> >> >>
> >> >> that would require the agent to be installed in the images for
> talking
> >> >>
> >> >> back to the compute host.
> >> >>
> >> >> ( there are other examples of similar approaches in openstack ... the
> >> >>
> >> >> murano-agent for installation, the swift-agent for object store
> >> >> management
> >> >> )
> >> >>
> >> >> Although here, in the case of VM HB/HC Monitoring, via the QEMU Guest
> >> >>
> >> >> Agent, the messaging path is internal thru a QEMU virtual serial
> >> >> device.
> >> >>
> >> >> i.e. a very simple interface with very few dependencies ... it’s up
> and
> >> >>
> >> >> available very early in VM lifecycle and virtually always up.
> >> >>
> >> >>
> >> >>
> >> >> Wrt failure modes / use-cases
> >> >>
> >> >>
> >> >>
> >> >> ·         a VM’s response to a Heartbeat Challenge Request can be as
> >> >>
> >> >> simple as just ACK-ing,
> >> >>
> >> >> this alone allows for detection of:
> >> >>
> >> >>
> >> >>
> >> >> o    a failed or hung QEMU/KVM instance, or
> >> >>
> >> >>
> >> >>
> >> >> o    a failed or hung VM’s OS, or
> >> >>
> >> >>
> >> >>
> >> >> o    a failure of the VM’s OS to schedule the QEMU Guest Agent
> daemon,
> >> >> or
> >> >>
> >> >>
> >> >>
> >> >> o    a failure of the VM to route basic IO via linux sockets.
> >> >>
> >> >>
> >> >>
> >> >> ·         I have had feedback that this is similar to the virtual
> >> >> hardware
> >> >>
> >> >> watchdog of QEMU/KVM (
> >> >>
> >> >> https://libvirt.org/formatdomain.html#elementsWatchdog )
> >> >>
> >> >>
> >> >>
> >> >> ·         However, the VM Heartbeat / Health-check Monitoring
> >> >>
> >> >>
> >> >>
> >> >> o   provides a higher-level (i.e. application-level) heartbeating
> >> >>
> >> >>
> >> >>
> >> >> §  i.e. if the Heartbeat requests are being answered by the
> Application
> >> >>
> >> >> running within the VM
> >> >>
> >> >>
> >> >>
> >> >> o   provides more than just heartbeating, as the Application can use
> it
> >> >> to
> >> >>
> >> >> trigger a variety of audits,
> >> >>
> >> >>
> >> >>
> >> >> o   provides a mechanism for the Application within the VM to report
> a
> >> >>
> >> >> Health Status / Info back to the Host / Cloud,
> >> >>
> >> >>
> >> >>
> >> >> o   provides notification of the Heartbeat / Health-check status to
> >> >>
> >> >> higher-level cloud entities thru Vitrage
> >> >>
> >> >>
> >> >>
> >> >> §  e.g.   VM-Heartbeat-Monitor - to - Vitrage - (EventAlarm) - Aodh -
> >> >> ...
> >> >>
> >> >> - VNF-Manager
> >> >>
> >> >>
> >> >>
> >> >> - (StateChange) - Nova - ... - VNF Manager
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> Greg.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> From: Adam Spiers <aspiers at suse.com<mailto:aspiers at suse.com>>
> >> >>
> >> >> Reply-To:
> >> >>
> >> >>
> >> >>
> >> >> "openstack-dev at lists.openstack.org<mailto:
> openstack-dev at lists.openstack.org>"
> >> >>
> >> >>
> >> >>
> >> >> <openstack-dev at lists.openstack.org<mailto:
> openstack-dev at lists.openstack.org>>
> >> >>
> >> >> Date: Tuesday, May 16, 2017 at 7:29 PM
> >> >>
> >> >> To:
> >> >>
> >> >>
> >> >>
> >> >> "openstack-dev at lists.openstack.org<mailto:
> openstack-dev at lists.openstack.org>"
> >> >>
> >> >>
> >> >>
> >> >> <openstack-dev at lists.openstack.org<mailto:
> openstack-dev at lists.openstack.org>>
> >> >>
> >> >> Subject: Re: [openstack-dev] [vitrage] [nova] [HA] VM Heartbeat /
> >> >>
> >> >> Healthcheck Monitoring
> >> >>
> >> >>
> >> >>
> >> >> Waines, Greg
> >> >>
> >> >>
> >> >>
> >> >> <Greg.Waines at windriver.com<mailto:Greg.Waines at windriver.com><mailto:
> Greg.Waines at windriver.com><mailto:Greg.Waines at windriver.com%3e>>
> >> >>
> >> >> wrote:
> >> >>
> >> >> thanks for the pointers Sam.
> >> >>
> >> >>
> >> >>
> >> >> I took a quick look.
> >> >>
> >> >> I agree that the VM Heartbeat / Health-check looks like a good fit
> into
> >> >>
> >> >> Masakari.
> >> >>
> >> >>
> >> >>
> >> >> Currently your instance monitoring looks like it is strictly
> black-box
> >> >>
> >> >> type monitoring thru libvirt events.
> >> >>
> >> >> Is that correct ?
> >> >>
> >> >> i.e. you do not do any intrusive type monitoring of the instance thru
> >> >> the
> >> >>
> >> >> QUEMU Guest Agent facility
> >> >>
> >> >>        correct ?
> >> >>
> >> >>
> >> >>
> >> >> That is correct:
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/instancemonitor/instance.py
> >> >>
> >> >>
> >> >>
> >> >> I think this is what VM Heartbeat / Health-check would add to
> Masaraki.
> >> >>
> >> >> Let me know if you agree.
> >> >>
> >> >>
> >> >>
> >> >> OK, so you are looking for something slightly different I guess,
> based
> >> >>
> >> >> on this QEMU guest agent?
> >> >>
> >> >>
> >> >>
> >> >>     https://wiki.libvirt.org/page/Qemu_guest_agent
> >> >>
> >> >>
> >> >>
> >> >> That would require the agent to be installed in the images, which is
> >> >>
> >> >> extra work but I imagine quite easily justifiable in some scenarios.
> >> >>
> >> >> What failure modes do you have in mind for covering with this
> >> >>
> >> >> approach - things like the guest kernel freezing, for instance?
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> __________________________________________________________________________
> >> >>
> >> >> OpenStack Development Mailing List (not for usage questions)
> >> >>
> >> >> Unsubscribe:
> >> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> >>
> >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> __________________________________________________________________________
> >> >>
> >> >> OpenStack Development Mailing List (not for usage questions)
> >> >>
> >> >> Unsubscribe:
> >> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> >>
> >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> __________________________________________________________________________
> >> >> OpenStack Development Mailing List (not for usage questions)
> >> >> Unsubscribe:
> >> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Regards,
> >> > Vikash
> >> >
> >> >
> >> >
> __________________________________________________________________________
> >> > OpenStack Development Mailing List (not for usage questions)
> >> > Unsubscribe:
> >> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >> >
> >>
> >>
> __________________________________________________________________________
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170530/c036f8a5/attachment.html>


More information about the OpenStack-dev mailing list