[openstack-dev] [masakari] Intrusive Instance Monitoring

Waines, Greg Greg.Waines at windriver.com
Wed May 17 21:15:23 UTC 2017

( I have been having a discussion with Adam Spiers on [openstack-dev][vitrage][nova] on this topic ... thought I would switchover to [masakari] )

I am interested in contributing an implementation of Intrusive Instance Monitoring,
initially specifically VM Heartbeat / Heath-check Monitoring thru the QEMU Guest Agent (https://wiki.libvirt.org/page/Qemu_guest_agent).

I’d like to know whether Masakari project leaders would consider a blueprint on “VM Heartbeat / Health-check Monitoring”.
See below for some more details,


VM Heartbeating / Health-check Monitoring would introduce intrusive / white-box type monitoring of VMs / Instances to Masakari.

Briefly, “VM Heartbeat / Health-check Monitoring”
·         is optionally enabled thru a Nova flavor extra-spec,
·         is a service that runs on an OpenStack Compute Node,
·         it sends periodic Heartbeat / Health-check Challenge Requests to a VM
over a virtio-serial-device setup between the Compute Node and the VM thru QEMU,
( https://wiki.libvirt.org/page/Qemu_guest_agent )
·         on loss of heartbeat or a failed health check status will result in fault event, against the VM, being
reported to Masakari and any other registered reporting backends like Mistral, or Vitrage.

I realize this is somewhat in the gray-zone of what a cloud should be monitoring or not,
but I believe it provides an alternative for Applications deployed in VMs that do not have an external monitoring/management entity like a VNF Manager in the MANO architecture.
And even for VMs with VNF Managers, it provides a highly reliable alternate monitoring path that does not rely on Tenant Networking.

VM HB/HC Monitoring would leverage  https://wiki.libvirt.org/page/Qemu_guest_agent
that would require the agent to be installed in the images for talking back to the compute host.
( there are other examples of similar approaches in openstack ... the murano-agent for installation, the swift-agent for object store management )
Although here, in the case of VM HB/HC Monitoring, via the QEMU Guest Agent, the messaging path is internal thru a QEMU virtual serial device.  i.e. a very simple interface with very few dependencies ... it’s up and available very early in VM lifecycle and virtually always up.

Wrt failure modes / use-cases
·         a VM’s response to a Heartbeat Challenge Request can be as simple as just ACK-ing,
this alone allows for detection of:
o    a failed or hung QEMU/KVM instance, or
o    a failed or hung VM’s OS, or
o    a failure of the VM’s OS to schedule the QEMU Guest Agent daemon, or
o    a failure of the VM to route basic IO via linux sockets.
·         I have had feedback that this is similar to the virtual hardware watchdog of QEMU/KVM (https://libvirt.org/formatdomain.html#elementsWatchdog )
·         However, the VM Heartbeat / Health-check Monitoring
o   provides a higher-level (i.e. application-level) heartbeating
•  i.e. if the Heartbeat requests are being answered by the Application running within the VM
o   provides more than just heartbeating, as the Application can use it to trigger a variety of audits,
o   provides a mechanism for the Application within the VM to report a Health Status / Info back to the Host / Cloud,
o   provides notification of the Heartbeat / Health-check status to higher-level cloud entities thru Masakari, Mistral and/or Vitrage
•  e.g.   VM-Heartbeat-Monitor - to - Vitrage - (EventAlarm) - Aodh - ... - VNF-Manager
                                                                                - (StateChange) - Nova - ... - VNF Manager

NOTE: perhaps the reporting to Vitrage would be a separate blueprint within Masakari.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170517/ed17b829/attachment.html>

More information about the OpenStack-dev mailing list