[openstack-dev] [vitrage] [nova] VM Heartbeat / Healthcheck Monitoring

Waines, Greg Greg.Waines at windriver.com
Wed May 10 16:41:49 UTC 2017


>
> How is this different from the watchdog action flavor extra spec / image
> property which already exists?
>

Yes, understand it is similar to the virtual hardware watchdog of QEMU/KVM.
( I referred to this at end of original mail, with ref:  https://libvirt.org/formatdomain.html#elementsWatchdog )

However, the VM Heartbeat / Health-check Monitoring

·         provides a higher-level (i.e. application-level) heartbeating

o    i.e. if the Heartbeat requests are being answered by the Application running within the VM

·         provides more than just heartbeating, as the Application can use it to trigger a variety of audits,

·         provides a mechanism for the Application within the VM to report a Health Status / Info back to the Host / Cloud,

·         provides notification of the Heartbeat / Health-check status to higher-level cloud entities thru Vitrage

o    e.g.   VM-Heartbeat-Monitor - to - Vitrage - (EventAlarm) - Aodh - ... - VNF-Manager
                                                                                - (StateChange) - Nova - ... - VNF Manager



Greg.


From: Matt Riedemann <mriedemos at gmail.com>
Reply-To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Date: Wednesday, May 10, 2017 at 9:49 AM
To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Subject: Re: [openstack-dev] [vitrage] [nova] VM Heartbeat / Healthcheck Monitoring

On 5/9/2017 1:11 PM, Waines, Greg wrote:
I am looking for guidance on where to propose some “_VM Heartbeat /
Health-check Monitoring_” functionality that I would like to contribute
to openstack.



Briefly, “_VM Heartbeat / Health-check Monitoring_”

·         is optionally enabled thru a Nova flavor extra-spec,

·         is a service that runs on an OpenStack Compute Node,

·         it sends periodic Heartbeat / Health-check Challenge Requests
to a VM
over a virtio-serial-device setup between the Compute Node and the VM
thru QEMU,

·         on loss of heartbeat or a failed health check status will
result in fault event, against the VM, being
reported to Vitrage thru its data-source API.



Where should I contribute this functionality ?

·         put it ALL in Vitrage ... both the monitoring and the
data-source reporting ?

·         put the monitoring in Nova, and just the data source reporting
in Vitrage ?

·         other ?



Greg.











p.s. other info ...



Benefits of “VM Heartbeat / Health-check Monitoring”

·         monitors health of OS and Applications INSIDE the VM

o    i.e. even just a simple Ack of the Heartbeat would validate that
the OS is running, IO mechanisms (sockets, etc)
are working and processes are getting scheduled

·         health-check status reporting can trigger and report on either
high-level or detailed application-specific audits within the VM,

·         the simple virtio-serial-device interface thru QEMU is UP very
early in VM life cycle and is virtually _always up_

o    i.e. its available for reporting issues virtually all the time,

o           ... compared to reporting issues over Tenant Network to a
remote VNFManager which relies on Ethernet and IP Networking within the
VM itself and then any provider network and adjacent routers around the
compute nodes ...

·         uses a simple “Line-Delimited JSON” Format over virtio serial
device ( http://www.linux-kvm.org/page/Virtio-serial_API )

o    simple to implement protocol inside VM, in pretty much any language

o    ( although would provide reference implementation )

·         provides more thorough instance monitoring than libvirt’s
emulated hardware watchdog (
https://libvirt.org/formatdomain.html#elementsWatchdog )



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org<mailto:OpenStack-dev-request at lists.openstack.org>?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


How is this different from the watchdog action flavor extra spec / image
property which already exists?

--

Thanks,

Matt

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org<mailto:OpenStack-dev-request at lists.openstack.org>?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170510/9f6231c9/attachment.html>


More information about the OpenStack-dev mailing list