[openstack-dev] [vitrage] [nova] [HA] [masakari] VM Heartbeat / Healthcheck Monitoring

Adam Spiers aspiers at suse.com
Wed May 17 17:29:07 UTC 2017


I don't see any reason why masakari couldn't handle that, but you'd
have to ask Sampath and the masakari team whether they would consider
that in scope for their roadmap.

Waines, Greg <Greg.Waines at windriver.com> wrote:
>Sure.  I can propose a new user story.
>
>And then are you thinking of including this user story in the scope of what masakari would be looking at ?
>
>Greg.
>
>
>From: Adam Spiers <aspiers at suse.com>
>Reply-To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
>Date: Wednesday, May 17, 2017 at 10:08 AM
>To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
>Subject: Re: [openstack-dev] [vitrage] [nova] [HA] VM Heartbeat / Healthcheck Monitoring
>
>Thanks for the clarification Greg.  This sounds like it has the
>potential to be a very useful capability.  May I suggest that you
>propose a new user story for it, along similar lines to this existing
>one?
>
>http://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/ha_vm.html
>
>Waines, Greg <Greg.Waines at windriver.com<mailto:Greg.Waines at windriver.com>> wrote:
>Yes that’s correct.
>VM Heartbeating / Health-check Monitoring would introduce intrusive / white-box type monitoring of VMs / Instances.
>
>I realize this is somewhat in the gray-zone of what a cloud should be monitoring or not,
>but I believe it provides an alternative for Applications deployed in VMs that do not have an external monitoring/management entity like a VNF Manager in the MANO architecture.
>And even for VMs with VNF Managers, it provides a highly reliable alternate monitoring path that does not rely on Tenant Networking.
>
>You’re correct, that VM HB/HC Monitoring would leverage
>https://wiki.libvirt.org/page/Qemu_guest_agent
>that would require the agent to be installed in the images for talking back to the compute host.
>( there are other examples of similar approaches in openstack ... the murano-agent for installation, the swift-agent for object store management )
>Although here, in the case of VM HB/HC Monitoring, via the QEMU Guest Agent, the messaging path is internal thru a QEMU virtual serial device.  i.e. a very simple interface with very few dependencies ... it’s up and available very early in VM lifecycle and virtually always up.
>
>Wrt failure modes / use-cases
>
>·         a VM’s response to a Heartbeat Challenge Request can be as simple as just ACK-ing,
>this alone allows for detection of:
>
>o    a failed or hung QEMU/KVM instance, or
>
>o    a failed or hung VM’s OS, or
>
>o    a failure of the VM’s OS to schedule the QEMU Guest Agent daemon, or
>
>o    a failure of the VM to route basic IO via linux sockets.
>
>·         I have had feedback that this is similar to the virtual hardware watchdog of QEMU/KVM ( https://libvirt.org/formatdomain.html#elementsWatchdog )
>
>·         However, the VM Heartbeat / Health-check Monitoring
>
>o   provides a higher-level (i.e. application-level) heartbeating
>
>§  i.e. if the Heartbeat requests are being answered by the Application running within the VM
>
>o   provides more than just heartbeating, as the Application can use it to trigger a variety of audits,
>
>o   provides a mechanism for the Application within the VM to report a Health Status / Info back to the Host / Cloud,
>
>o   provides notification of the Heartbeat / Health-check status to higher-level cloud entities thru Vitrage
>
>§  e.g.   VM-Heartbeat-Monitor - to - Vitrage - (EventAlarm) - Aodh - ... - VNF-Manager
>                                                                                - (StateChange) - Nova - ... - VNF Manager
>
>
>Greg.
>
>
>From: Adam Spiers <aspiers at suse.com<mailto:aspiers at suse.com>>
>Reply-To: "openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
>Date: Tuesday, May 16, 2017 at 7:29 PM
>To: "openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
>Subject: Re: [openstack-dev] [vitrage] [nova] [HA] VM Heartbeat / Healthcheck Monitoring
>
>Waines, Greg <Greg.Waines at windriver.com<mailto:Greg.Waines at windriver.com><mailto:Greg.Waines at windriver.com><mailto:Greg.Waines at windriver.com%3e>> wrote:
>thanks for the pointers Sam.
>
>I took a quick look.
>I agree that the VM Heartbeat / Health-check looks like a good fit into Masakari.
>
>Currently your instance monitoring looks like it is strictly black-box type monitoring thru libvirt events.
>Is that correct ?
>i.e. you do not do any intrusive type monitoring of the instance thru the QUEMU Guest Agent facility
>       correct ?
>
>That is correct:
>
>https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/instancemonitor/instance.py
>
>I think this is what VM Heartbeat / Health-check would add to Masaraki.
>Let me know if you agree.
>
>OK, so you are looking for something slightly different I guess, based
>on this QEMU guest agent?
>
>    https://wiki.libvirt.org/page/Qemu_guest_agent
>
>That would require the agent to be installed in the images, which is
>extra work but I imagine quite easily justifiable in some scenarios.
>What failure modes do you have in mind for covering with this
>approach - things like the guest kernel freezing, for instance?



More information about the OpenStack-dev mailing list