[openstack-dev] [vitrage] [nova] [HA] [masakari] VM Heartbeat / Healthcheck Monitoring

Waines, Greg Greg.Waines at windriver.com
Thu May 18 18:03:00 UTC 2017


Yes I am good with writing spec for this in masakari-spec.

Do you use gerrit for this git ?
Do you have a template for your specs ?

Greg.



From: Sam P <sam47priya at gmail.com>
Reply-To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Date: Thursday, May 18, 2017 at 1:51 PM
To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Subject: Re: [openstack-dev] [vitrage] [nova] [HA] [masakari] VM Heartbeat / Healthcheck Monitoring

Hi Greg,
Thank you Adam for followup.
This is new feature for masakari-monitors and think  Masakari can
accommodate this feature in  masakari-monitors.
From the implementation prospective, it is not that hard to do.
However, as you can see in our Boston presentation, Masakari will
replace its monitoring parts ( which is masakari-monitors) with,
nova-host-alerter, **-process-alerter, and **-instance-alerter. (**
part is not defined yet..:p)...
Therefore, I would like to save this specifications, and make sure we
will not miss  anything in the transformation..
Does is make sense to write simple spec for this in masakari-spec [1]?
So we can discuss about the requirements how to implement it.

[1] https://github.com/openstack/masakari-specs

--- Regards,
Sampath



On Thu, May 18, 2017 at 2:29 AM, Adam Spiers <aspiers at suse.com<mailto:aspiers at suse.com>> wrote:
I don't see any reason why masakari couldn't handle that, but you'd
have to ask Sampath and the masakari team whether they would consider
that in scope for their roadmap.

Waines, Greg <Greg.Waines at windriver.com<mailto:Greg.Waines at windriver.com>> wrote:

Sure.  I can propose a new user story.

And then are you thinking of including this user story in the scope of
what masakari would be looking at ?

Greg.


From: Adam Spiers <aspiers at suse.com<mailto:aspiers at suse.com>>
Reply-To: "openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>"
<openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Date: Wednesday, May 17, 2017 at 10:08 AM
To: "openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>"
<openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Subject: Re: [openstack-dev] [vitrage] [nova] [HA] VM Heartbeat /
Healthcheck Monitoring

Thanks for the clarification Greg.  This sounds like it has the
potential to be a very useful capability.  May I suggest that you
propose a new user story for it, along similar lines to this existing
one?


http://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/ha_vm.html

Waines, Greg <Greg.Waines at windriver.com<mailto:Greg.Waines at windriver.com><mailto:Greg.Waines at windriver.com><mailto:Greg.Waines at windriver.com%3e>>
wrote:
Yes that’s correct.
VM Heartbeating / Health-check Monitoring would introduce intrusive /
white-box type monitoring of VMs / Instances.

I realize this is somewhat in the gray-zone of what a cloud should be
monitoring or not,
but I believe it provides an alternative for Applications deployed in VMs
that do not have an external monitoring/management entity like a VNF Manager
in the MANO architecture.
And even for VMs with VNF Managers, it provides a highly reliable
alternate monitoring path that does not rely on Tenant Networking.

You’re correct, that VM HB/HC Monitoring would leverage
https://wiki.libvirt.org/page/Qemu_guest_agent
that would require the agent to be installed in the images for talking
back to the compute host.
( there are other examples of similar approaches in openstack ... the
murano-agent for installation, the swift-agent for object store management )
Although here, in the case of VM HB/HC Monitoring, via the QEMU Guest
Agent, the messaging path is internal thru a QEMU virtual serial device.
i.e. a very simple interface with very few dependencies ... it’s up and
available very early in VM lifecycle and virtually always up.

Wrt failure modes / use-cases

·         a VM’s response to a Heartbeat Challenge Request can be as
simple as just ACK-ing,
this alone allows for detection of:

o    a failed or hung QEMU/KVM instance, or

o    a failed or hung VM’s OS, or

o    a failure of the VM’s OS to schedule the QEMU Guest Agent daemon, or

o    a failure of the VM to route basic IO via linux sockets.

·         I have had feedback that this is similar to the virtual hardware
watchdog of QEMU/KVM (
https://libvirt.org/formatdomain.html#elementsWatchdog )

·         However, the VM Heartbeat / Health-check Monitoring

o   provides a higher-level (i.e. application-level) heartbeating

§  i.e. if the Heartbeat requests are being answered by the Application
running within the VM

o   provides more than just heartbeating, as the Application can use it to
trigger a variety of audits,

o   provides a mechanism for the Application within the VM to report a
Health Status / Info back to the Host / Cloud,

o   provides notification of the Heartbeat / Health-check status to
higher-level cloud entities thru Vitrage

§  e.g.   VM-Heartbeat-Monitor - to - Vitrage - (EventAlarm) - Aodh - ...
- VNF-Manager

- (StateChange) - Nova - ... - VNF Manager


Greg.


From: Adam Spiers <aspiers at suse.com<mailto:aspiers at suse.com><mailto:aspiers at suse.com><mailto:aspiers at suse.com%3e>>
Reply-To:
"openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org><mailto:openstack-dev at lists.openstack.org><mailto:openstack-dev at lists.openstack.org%3e>"
<openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org><mailto:openstack-dev at lists.openstack.org><mailto:openstack-dev at lists.openstack.org%3e>>
Date: Tuesday, May 16, 2017 at 7:29 PM
To:
"openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org><mailto:openstack-dev at lists.openstack.org><mailto:openstack-dev at lists.openstack.org%3e>"
<openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org><mailto:openstack-dev at lists.openstack.org><mailto:openstack-dev at lists.openstack.org%3e>>
Subject: Re: [openstack-dev] [vitrage] [nova] [HA] VM Heartbeat /
Healthcheck Monitoring

Waines, Greg
<Greg.Waines at windriver.com<mailto:Greg.Waines at windriver.com><mailto:Greg.Waines at windriver.com><mailto:Greg.Waines at windriver.com><mailto:Greg.Waines at windriver.com%3e><mailto:Greg.Waines at windriver.com%3e%3cmailto:Greg.Waines at windriver.com%3e%3cmailto:Greg.Waines at windriver.com%3e%3e>>
wrote:
thanks for the pointers Sam.

I took a quick look.
I agree that the VM Heartbeat / Health-check looks like a good fit into
Masakari.

Currently your instance monitoring looks like it is strictly black-box
type monitoring thru libvirt events.
Is that correct ?
i.e. you do not do any intrusive type monitoring of the instance thru the
QUEMU Guest Agent facility
       correct ?

That is correct:


https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/instancemonitor/instance.py

I think this is what VM Heartbeat / Health-check would add to Masaraki.
Let me know if you agree.

OK, so you are looking for something slightly different I guess, based
on this QEMU guest agent?

    https://wiki.libvirt.org/page/Qemu_guest_agent

That would require the agent to be installed in the images, which is
extra work but I imagine quite easily justifiable in some scenarios.
What failure modes do you have in mind for covering with this
approach - things like the guest kernel freezing, for instance?


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org<mailto:OpenStack-dev-request at lists.openstack.org>?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org<mailto:OpenStack-dev-request at lists.openstack.org>?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170518/4b5a2de1/attachment.html>


More information about the OpenStack-dev mailing list