[openstack-dev] [vitrage] [nova] VM Heartbeat / Healthcheck Monitoring

Waines, Greg Greg.Waines at windriver.com
Mon May 15 19:13:20 UTC 2017


Sorry for the slow response.

Ifat,
You do understand correctly.
And I understand, that this does not really fit in Vitrage ... i.e. Vitrage has no other examples of monitoring itself being done in Vitrage.

Do you know if Zabbix has VM related monitoring ?
If they don’t already do this then, I might have difficulty getting it into Zabbix.

The other option I was thinking of was to see if I could contribute to QEMU as an optional layer on top of the QEMU Guest Agent ... and then having the alarm consumed by Vitrage.

My only other option would be to contribute into the OPNFV Availability project ... as an incremental VM Heartbeating / Health-checking functionality that would build on top of the openstack offering ... although not sure if OPNFV Availability project was interested in doing code ... I think they might be just a requirements team.

Greg.



From: "Afek, Ifat (Nokia - IL/Kfar Sava)" <ifat.afek at nokia.com>
Reply-To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Date: Wednesday, May 10, 2017 at 11:06 PM
To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Subject: Re: [openstack-dev] [vitrage] [nova] VM Heartbeat / Healthcheck Monitoring

Hi Greg,

If I understand correctly, you would like to add a test that checks if for every VM a heartbeat was retrieved in the last x seconds. Right?

Vitrage is not designed to perform such tests. Vitrage datasources retrieve topology (either by polling or by notifications) from services like Nova, Cinder, Neutron or Heat, and pass the topology to the Vitrage entity graph. In addition, they retrieve alarms from monitors like Aodh, Zabbix, Nagios or Collectd, and create these alarms in the entity graph as well. There is currently no place where you can check if an event arrived or not.

How about adding this test to a monitoring tool like Zabbix, and then consume the alarm (for a missing heartbeat) in Vitrage?

Best Regards,
Ifat.

From: "Waines, Greg" <Greg.Waines at windriver.com>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Date: Wednesday, 10 May 2017 at 13:24
To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Subject: Re: [openstack-dev] [vitrage] [nova] VM Heartbeat / Healthcheck Monitoring

Some other UPDATES on this proposal (from outside the mailing list):


·         this should probably be based on an ‘image property’ rather than a ‘flavor extraspec’,
since it requires code to be included in the guest/VM image,




·         rather than use a unique virtio-serial link for the Heartbeat/Health-check Monitoring Messaging,
propose that we leverage the existing http://wiki.qemu.org/Features/GuestAgent

o   NOVA already supports a ‘hw_qemu_guest_agent=True’ image property
which results in NOVA setting up a virtio-serial connection to a QEMU Guest Agent
within the Guest/VM,

o   use this for the transport messaging layer for VM Heartbeating/Health-checking


With respect to ... where to propose / contribute this functionality,
Given that

·         this may require very little work in NOVA (by using QEMU Guest Agent), and

·         the fact that the primary result of VM Heartbeating / Health-checking is to report per-instance HB/HC status to Vitrage,
I am thinking that this would fit better simply in Vitrage.
An optional functionality enabled thru /etc/vitrage/vitrage.conf .


Comments ?
Greg.


From: Greg Waines <Greg.Waines at windriver.com>
Reply-To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Date: Tuesday, May 9, 2017 at 1:11 PM
To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Subject: [openstack-dev] [vitrage] [nova] VM Heartbeat / Healthcheck Monitoring

I am looking for guidance on where to propose some “VM Heartbeat / Health-check Monitoring” functionality that I would like to contribute to openstack.

Briefly, “VM Heartbeat / Health-check Monitoring”

·         is optionally enabled thru a Nova flavor extra-spec,

·         is a service that runs on an OpenStack Compute Node,

·         it sends periodic Heartbeat / Health-check Challenge Requests to a VM
over a virtio-serial-device setup between the Compute Node and the VM thru QEMU,

·         on loss of heartbeat or a failed health check status will result in fault event, against the VM, being
reported to Vitrage thru its data-source API.

Where should I contribute this functionality ?

·         put it ALL in Vitrage ... both the monitoring and the data-source reporting ?

·         put the monitoring in Nova, and just the data source reporting in Vitrage ?

·         other ?

Greg.





p.s. other info ...

Benefits of “VM Heartbeat / Health-check Monitoring”





·         monitors health of OS and Applications INSIDE the VM

o   i.e. even just a simple Ack of the Heartbeat would validate that the OS is running, IO mechanisms (sockets, etc)
are working and processes are getting scheduled

·         health-check status reporting can trigger and report on either high-level or detailed application-specific audits within the VM,





·         the simple virtio-serial-device interface thru QEMU is UP very early in VM life cycle and is virtually always up

o   i.e. its available for reporting issues virtually all the time,

o          ... compared to reporting issues over Tenant Network to a remote VNFManager which relies on Ethernet and IP Networking within the VM itself and then any provider network and adjacent routers around the compute nodes ...





·         uses a simple “Line-Delimited JSON” Format over virtio serial device ( http://www.linux-kvm.org/page/Virtio-serial_API )

o   simple to implement protocol inside VM, in pretty much any language

o   ( although would provide reference implementation )





·         provides more thorough instance monitoring than libvirt’s emulated hardware watchdog ( https://libvirt.org/formatdomain.html#elementsWatchdog )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170515/803a01ab/attachment.html>


More information about the OpenStack-dev mailing list