[openstack-dev] [vitrage] [nova] VM Heartbeat / Healthcheck Monitoring
Afek, Ifat (Nokia - IL/Kfar Sava)
ifat.afek at nokia.com
Thu May 11 03:06:54 UTC 2017
Hi Greg,
If I understand correctly, you would like to add a test that checks if for every VM a heartbeat was retrieved in the last x seconds. Right?
Vitrage is not designed to perform such tests. Vitrage datasources retrieve topology (either by polling or by notifications) from services like Nova, Cinder, Neutron or Heat, and pass the topology to the Vitrage entity graph. In addition, they retrieve alarms from monitors like Aodh, Zabbix, Nagios or Collectd, and create these alarms in the entity graph as well. There is currently no place where you can check if an event arrived or not.
How about adding this test to a monitoring tool like Zabbix, and then consume the alarm (for a missing heartbeat) in Vitrage?
Best Regards,
Ifat.
From: "Waines, Greg" <Greg.Waines at windriver.com>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Date: Wednesday, 10 May 2017 at 13:24
To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Subject: Re: [openstack-dev] [vitrage] [nova] VM Heartbeat / Healthcheck Monitoring
Some other UPDATES on this proposal (from outside the mailing list):
· this should probably be based on an ‘image property’ rather than a ‘flavor extraspec’,
since it requires code to be included in the guest/VM image,
· rather than use a unique virtio-serial link for the Heartbeat/Health-check Monitoring Messaging,
propose that we leverage the existing http://wiki.qemu.org/Features/GuestAgent
o NOVA already supports a ‘hw_qemu_guest_agent=True’ image property
which results in NOVA setting up a virtio-serial connection to a QEMU Guest Agent
within the Guest/VM,
o use this for the transport messaging layer for VM Heartbeating/Health-checking
With respect to ... where to propose / contribute this functionality,
Given that
· this may require very little work in NOVA (by using QEMU Guest Agent), and
· the fact that the primary result of VM Heartbeating / Health-checking is to report per-instance HB/HC status to Vitrage,
I am thinking that this would fit better simply in Vitrage.
An optional functionality enabled thru /etc/vitrage/vitrage.conf .
Comments ?
Greg.
From: Greg Waines <Greg.Waines at windriver.com>
Reply-To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Date: Tuesday, May 9, 2017 at 1:11 PM
To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Subject: [openstack-dev] [vitrage] [nova] VM Heartbeat / Healthcheck Monitoring
I am looking for guidance on where to propose some “VM Heartbeat / Health-check Monitoring” functionality that I would like to contribute to openstack.
Briefly, “VM Heartbeat / Health-check Monitoring”
· is optionally enabled thru a Nova flavor extra-spec,
· is a service that runs on an OpenStack Compute Node,
· it sends periodic Heartbeat / Health-check Challenge Requests to a VM
over a virtio-serial-device setup between the Compute Node and the VM thru QEMU,
· on loss of heartbeat or a failed health check status will result in fault event, against the VM, being
reported to Vitrage thru its data-source API.
Where should I contribute this functionality ?
· put it ALL in Vitrage ... both the monitoring and the data-source reporting ?
· put the monitoring in Nova, and just the data source reporting in Vitrage ?
· other ?
Greg.
p.s. other info ...
Benefits of “VM Heartbeat / Health-check Monitoring”
· monitors health of OS and Applications INSIDE the VM
o i.e. even just a simple Ack of the Heartbeat would validate that the OS is running, IO mechanisms (sockets, etc)
are working and processes are getting scheduled
· health-check status reporting can trigger and report on either high-level or detailed application-specific audits within the VM,
· the simple virtio-serial-device interface thru QEMU is UP very early in VM life cycle and is virtually always up
o i.e. its available for reporting issues virtually all the time,
o ... compared to reporting issues over Tenant Network to a remote VNFManager which relies on Ethernet and IP Networking within the VM itself and then any provider network and adjacent routers around the compute nodes ...
· uses a simple “Line-Delimited JSON” Format over virtio serial device ( http://www.linux-kvm.org/page/Virtio-serial_API )
o simple to implement protocol inside VM, in pretty much any language
o ( although would provide reference implementation )
· provides more thorough instance monitoring than libvirt’s emulated hardware watchdog ( https://libvirt.org/formatdomain.html#elementsWatchdog )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170511/1ca66f19/attachment.html>
More information about the OpenStack-dev
mailing list