[openstack-dev] [vitrage] [nova] [HA] [masakari] VM Heartbeat / Healthcheck Monitoring

Sam P sam47priya at gmail.com
Tue May 30 12:19:48 UTC 2017


Hi Vikash,

  Greg submit the spec [1] for intrusive instance monitoring.
  Your review will be highly appreciated..
 [1] https://review.openstack.org/#/c/469070/
--- Regards,
Sampath



On Sat, May 20, 2017 at 4:49 PM, Vikash Kumar
<Vikash.Kumar at oneconvergence.com> wrote:
> Thanks Sam
>
>
> On Sat, 20 May 2017, 06:51 Sam P, <sam47priya at gmail.com> wrote:
>>
>> Hi Vikash,
>>  Great... I will add you as reviewer to this spec.
>>  Thank you..
>> --- Regards,
>> Sampath
>>
>>
>>
>> On Fri, May 19, 2017 at 1:06 PM, Vikash Kumar
>> <vikash.kumar at oneconvergence.com> wrote:
>> > Hi Greg,
>> >
>> >     Please include my email in this spec also. We are also dealing with
>> > HA
>> > of Virtual Instances (especially for Vendors) and will participate.
>> >
>> > On Thu, May 18, 2017 at 11:33 PM, Waines, Greg
>> > <Greg.Waines at windriver.com>
>> > wrote:
>> >>
>> >> Yes I am good with writing spec for this in masakari-spec.
>> >>
>> >>
>> >>
>> >> Do you use gerrit for this git ?
>> >>
>> >> Do you have a template for your specs ?
>> >>
>> >>
>> >>
>> >> Greg.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> From: Sam P <sam47priya at gmail.com>
>> >> Reply-To: "openstack-dev at lists.openstack.org"
>> >> <openstack-dev at lists.openstack.org>
>> >> Date: Thursday, May 18, 2017 at 1:51 PM
>> >> To: "openstack-dev at lists.openstack.org"
>> >> <openstack-dev at lists.openstack.org>
>> >> Subject: Re: [openstack-dev] [vitrage] [nova] [HA] [masakari] VM
>> >> Heartbeat
>> >> / Healthcheck Monitoring
>> >>
>> >>
>> >>
>> >> Hi Greg,
>> >>
>> >> Thank you Adam for followup.
>> >>
>> >> This is new feature for masakari-monitors and think  Masakari can
>> >>
>> >> accommodate this feature in  masakari-monitors.
>> >>
>> >> From the implementation prospective, it is not that hard to do.
>> >>
>> >> However, as you can see in our Boston presentation, Masakari will
>> >>
>> >> replace its monitoring parts ( which is masakari-monitors) with,
>> >>
>> >> nova-host-alerter, **-process-alerter, and **-instance-alerter. (**
>> >>
>> >> part is not defined yet..:p)...
>> >>
>> >> Therefore, I would like to save this specifications, and make sure we
>> >>
>> >> will not miss  anything in the transformation..
>> >>
>> >> Does is make sense to write simple spec for this in masakari-spec [1]?
>> >>
>> >> So we can discuss about the requirements how to implement it.
>> >>
>> >>
>> >>
>> >> [1] https://github.com/openstack/masakari-specs
>> >>
>> >>
>> >>
>> >> --- Regards,
>> >>
>> >> Sampath
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, May 18, 2017 at 2:29 AM, Adam Spiers <aspiers at suse.com> wrote:
>> >>
>> >> I don't see any reason why masakari couldn't handle that, but you'd
>> >>
>> >> have to ask Sampath and the masakari team whether they would consider
>> >>
>> >> that in scope for their roadmap.
>> >>
>> >>
>> >>
>> >> Waines, Greg <Greg.Waines at windriver.com> wrote:
>> >>
>> >>
>> >>
>> >> Sure.  I can propose a new user story.
>> >>
>> >>
>> >>
>> >> And then are you thinking of including this user story in the scope of
>> >>
>> >> what masakari would be looking at ?
>> >>
>> >>
>> >>
>> >> Greg.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> From: Adam Spiers <aspiers at suse.com>
>> >>
>> >> Reply-To: "openstack-dev at lists.openstack.org"
>> >>
>> >> <openstack-dev at lists.openstack.org>
>> >>
>> >> Date: Wednesday, May 17, 2017 at 10:08 AM
>> >>
>> >> To: "openstack-dev at lists.openstack.org"
>> >>
>> >> <openstack-dev at lists.openstack.org>
>> >>
>> >> Subject: Re: [openstack-dev] [vitrage] [nova] [HA] VM Heartbeat /
>> >>
>> >> Healthcheck Monitoring
>> >>
>> >>
>> >>
>> >> Thanks for the clarification Greg.  This sounds like it has the
>> >>
>> >> potential to be a very useful capability.  May I suggest that you
>> >>
>> >> propose a new user story for it, along similar lines to this existing
>> >>
>> >> one?
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> http://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/ha_vm.html
>> >>
>> >>
>> >>
>> >> Waines, Greg
>> >> <Greg.Waines at windriver.com<mailto:Greg.Waines at windriver.com>>
>> >>
>> >> wrote:
>> >>
>> >> Yes that’s correct.
>> >>
>> >> VM Heartbeating / Health-check Monitoring would introduce intrusive /
>> >>
>> >> white-box type monitoring of VMs / Instances.
>> >>
>> >>
>> >>
>> >> I realize this is somewhat in the gray-zone of what a cloud should be
>> >>
>> >> monitoring or not,
>> >>
>> >> but I believe it provides an alternative for Applications deployed in
>> >> VMs
>> >>
>> >> that do not have an external monitoring/management entity like a VNF
>> >> Manager
>> >>
>> >> in the MANO architecture.
>> >>
>> >> And even for VMs with VNF Managers, it provides a highly reliable
>> >>
>> >> alternate monitoring path that does not rely on Tenant Networking.
>> >>
>> >>
>> >>
>> >> You’re correct, that VM HB/HC Monitoring would leverage
>> >>
>> >> https://wiki.libvirt.org/page/Qemu_guest_agent
>> >>
>> >> that would require the agent to be installed in the images for talking
>> >>
>> >> back to the compute host.
>> >>
>> >> ( there are other examples of similar approaches in openstack ... the
>> >>
>> >> murano-agent for installation, the swift-agent for object store
>> >> management
>> >> )
>> >>
>> >> Although here, in the case of VM HB/HC Monitoring, via the QEMU Guest
>> >>
>> >> Agent, the messaging path is internal thru a QEMU virtual serial
>> >> device.
>> >>
>> >> i.e. a very simple interface with very few dependencies ... it’s up and
>> >>
>> >> available very early in VM lifecycle and virtually always up.
>> >>
>> >>
>> >>
>> >> Wrt failure modes / use-cases
>> >>
>> >>
>> >>
>> >> ·         a VM’s response to a Heartbeat Challenge Request can be as
>> >>
>> >> simple as just ACK-ing,
>> >>
>> >> this alone allows for detection of:
>> >>
>> >>
>> >>
>> >> o    a failed or hung QEMU/KVM instance, or
>> >>
>> >>
>> >>
>> >> o    a failed or hung VM’s OS, or
>> >>
>> >>
>> >>
>> >> o    a failure of the VM’s OS to schedule the QEMU Guest Agent daemon,
>> >> or
>> >>
>> >>
>> >>
>> >> o    a failure of the VM to route basic IO via linux sockets.
>> >>
>> >>
>> >>
>> >> ·         I have had feedback that this is similar to the virtual
>> >> hardware
>> >>
>> >> watchdog of QEMU/KVM (
>> >>
>> >> https://libvirt.org/formatdomain.html#elementsWatchdog )
>> >>
>> >>
>> >>
>> >> ·         However, the VM Heartbeat / Health-check Monitoring
>> >>
>> >>
>> >>
>> >> o   provides a higher-level (i.e. application-level) heartbeating
>> >>
>> >>
>> >>
>> >> §  i.e. if the Heartbeat requests are being answered by the Application
>> >>
>> >> running within the VM
>> >>
>> >>
>> >>
>> >> o   provides more than just heartbeating, as the Application can use it
>> >> to
>> >>
>> >> trigger a variety of audits,
>> >>
>> >>
>> >>
>> >> o   provides a mechanism for the Application within the VM to report a
>> >>
>> >> Health Status / Info back to the Host / Cloud,
>> >>
>> >>
>> >>
>> >> o   provides notification of the Heartbeat / Health-check status to
>> >>
>> >> higher-level cloud entities thru Vitrage
>> >>
>> >>
>> >>
>> >> §  e.g.   VM-Heartbeat-Monitor - to - Vitrage - (EventAlarm) - Aodh -
>> >> ...
>> >>
>> >> - VNF-Manager
>> >>
>> >>
>> >>
>> >> - (StateChange) - Nova - ... - VNF Manager
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Greg.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> From: Adam Spiers <aspiers at suse.com<mailto:aspiers at suse.com>>
>> >>
>> >> Reply-To:
>> >>
>> >>
>> >>
>> >> "openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>"
>> >>
>> >>
>> >>
>> >> <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
>> >>
>> >> Date: Tuesday, May 16, 2017 at 7:29 PM
>> >>
>> >> To:
>> >>
>> >>
>> >>
>> >> "openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>"
>> >>
>> >>
>> >>
>> >> <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
>> >>
>> >> Subject: Re: [openstack-dev] [vitrage] [nova] [HA] VM Heartbeat /
>> >>
>> >> Healthcheck Monitoring
>> >>
>> >>
>> >>
>> >> Waines, Greg
>> >>
>> >>
>> >>
>> >> <Greg.Waines at windriver.com<mailto:Greg.Waines at windriver.com><mailto:Greg.Waines at windriver.com><mailto:Greg.Waines at windriver.com%3e>>
>> >>
>> >> wrote:
>> >>
>> >> thanks for the pointers Sam.
>> >>
>> >>
>> >>
>> >> I took a quick look.
>> >>
>> >> I agree that the VM Heartbeat / Health-check looks like a good fit into
>> >>
>> >> Masakari.
>> >>
>> >>
>> >>
>> >> Currently your instance monitoring looks like it is strictly black-box
>> >>
>> >> type monitoring thru libvirt events.
>> >>
>> >> Is that correct ?
>> >>
>> >> i.e. you do not do any intrusive type monitoring of the instance thru
>> >> the
>> >>
>> >> QUEMU Guest Agent facility
>> >>
>> >>        correct ?
>> >>
>> >>
>> >>
>> >> That is correct:
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/instancemonitor/instance.py
>> >>
>> >>
>> >>
>> >> I think this is what VM Heartbeat / Health-check would add to Masaraki.
>> >>
>> >> Let me know if you agree.
>> >>
>> >>
>> >>
>> >> OK, so you are looking for something slightly different I guess, based
>> >>
>> >> on this QEMU guest agent?
>> >>
>> >>
>> >>
>> >>     https://wiki.libvirt.org/page/Qemu_guest_agent
>> >>
>> >>
>> >>
>> >> That would require the agent to be installed in the images, which is
>> >>
>> >> extra work but I imagine quite easily justifiable in some scenarios.
>> >>
>> >> What failure modes do you have in mind for covering with this
>> >>
>> >> approach - things like the guest kernel freezing, for instance?
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> __________________________________________________________________________
>> >>
>> >> OpenStack Development Mailing List (not for usage questions)
>> >>
>> >> Unsubscribe:
>> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> >>
>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >>
>> >>
>> >>
>> >>
>> >> __________________________________________________________________________
>> >>
>> >> OpenStack Development Mailing List (not for usage questions)
>> >>
>> >> Unsubscribe:
>> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> >>
>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> __________________________________________________________________________
>> >> OpenStack Development Mailing List (not for usage questions)
>> >> Unsubscribe:
>> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >>
>> >
>> >
>> >
>> > --
>> > Regards,
>> > Vikash
>> >
>> >
>> > __________________________________________________________________________
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe:
>> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list