[openstack-dev] [masakari] Intrusive Instance Monitoring
Greg.Waines at windriver.com
Wed May 17 21:15:23 UTC 2017
( I have been having a discussion with Adam Spiers on [openstack-dev][vitrage][nova] on this topic ... thought I would switchover to [masakari] )
I am interested in contributing an implementation of Intrusive Instance Monitoring,
initially specifically VM Heartbeat / Heath-check Monitoring thru the QEMU Guest Agent (https://wiki.libvirt.org/page/Qemu_guest_agent).
I’d like to know whether Masakari project leaders would consider a blueprint on “VM Heartbeat / Health-check Monitoring”.
See below for some more details,
VM Heartbeating / Health-check Monitoring would introduce intrusive / white-box type monitoring of VMs / Instances to Masakari.
Briefly, “VM Heartbeat / Health-check Monitoring”
· is optionally enabled thru a Nova flavor extra-spec,
· is a service that runs on an OpenStack Compute Node,
· it sends periodic Heartbeat / Health-check Challenge Requests to a VM
over a virtio-serial-device setup between the Compute Node and the VM thru QEMU,
( https://wiki.libvirt.org/page/Qemu_guest_agent )
· on loss of heartbeat or a failed health check status will result in fault event, against the VM, being
reported to Masakari and any other registered reporting backends like Mistral, or Vitrage.
I realize this is somewhat in the gray-zone of what a cloud should be monitoring or not,
but I believe it provides an alternative for Applications deployed in VMs that do not have an external monitoring/management entity like a VNF Manager in the MANO architecture.
And even for VMs with VNF Managers, it provides a highly reliable alternate monitoring path that does not rely on Tenant Networking.
VM HB/HC Monitoring would leverage https://wiki.libvirt.org/page/Qemu_guest_agent
that would require the agent to be installed in the images for talking back to the compute host.
( there are other examples of similar approaches in openstack ... the murano-agent for installation, the swift-agent for object store management )
Although here, in the case of VM HB/HC Monitoring, via the QEMU Guest Agent, the messaging path is internal thru a QEMU virtual serial device. i.e. a very simple interface with very few dependencies ... it’s up and available very early in VM lifecycle and virtually always up.
Wrt failure modes / use-cases
· a VM’s response to a Heartbeat Challenge Request can be as simple as just ACK-ing,
this alone allows for detection of:
o a failed or hung QEMU/KVM instance, or
o a failed or hung VM’s OS, or
o a failure of the VM’s OS to schedule the QEMU Guest Agent daemon, or
o a failure of the VM to route basic IO via linux sockets.
· I have had feedback that this is similar to the virtual hardware watchdog of QEMU/KVM (https://libvirt.org/formatdomain.html#elementsWatchdog )
· However, the VM Heartbeat / Health-check Monitoring
o provides a higher-level (i.e. application-level) heartbeating
• i.e. if the Heartbeat requests are being answered by the Application running within the VM
o provides more than just heartbeating, as the Application can use it to trigger a variety of audits,
o provides a mechanism for the Application within the VM to report a Health Status / Info back to the Host / Cloud,
o provides notification of the Heartbeat / Health-check status to higher-level cloud entities thru Masakari, Mistral and/or Vitrage
• e.g. VM-Heartbeat-Monitor - to - Vitrage - (EventAlarm) - Aodh - ... - VNF-Manager
- (StateChange) - Nova - ... - VNF Manager
NOTE: perhaps the reporting to Vitrage would be a separate blueprint within Masakari.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev