[openstack-dev] [vitrage] [nova] [HA] VM Heartbeat / Healthcheck Monitoring

Adam Spiers aspiers at suse.com
Tue May 16 12:37:23 UTC 2017


Afek, Ifat (Nokia - IL/Kfar Sava) <ifat.afek at nokia.com> wrote:
>On 16/05/2017, 4:36, "Sam P" <sam47priya at gmail.com> wrote:
>
>    Hi Greg,
>
>     In Masakari [0] for VMHA, we have already implemented some what
>    similar function in masakri-monitors.
>     Masakari-monitors runs on nova-compute node, and monitors the host,
>    process or instance failures.
>     Masakari instance monitor has similar functionality with what you
>    have described.
>     Please see [1] for more details on instance monitoring.
>     [0] https://wiki.openstack.org/wiki/Masakari
>     [1] https://github.com/openstack/masakari-monitors/tree/master/masakarimonitors/instancemonitor
>
>     Once masakari-monitors detect failures, it will send notifications to
>    masakari-api to take appropriate recovery actions to recover that VM
>    from failures.

You can also find out more about our architectural plans by watching
this talk which Sampath and I gave in Boston:

   https://www.openstack.org/videos/boston-2017/high-availability-for-instances-moving-to-a-converged-upstream-solution

The slides are here:

   https://aspiers.github.io/openstack-summit-2017-boston-compute-ha/

We didn't go into much depth on monitoring and recovery of individual
VMs, but as Sampath explained, Masakari already handles both of these.

>Hi Greg, Sam,
>
>As Vitrage is about correlating alarms that come from different
>sources, and is not a monitor by itself – I think that it can benefit
>from information retrieved by both Masakari and Zabbix monitors.
>
>Zabbix is already integrated into Vitrage. I don’t know if there are
>specific tests for VM heartbeat, but I think it is very likely that
>there are.  Regarding Masakari – looking at your documents, I believe
>that integrating your monitoring information into Vitrage could be
>quite straight forward.

Yes, this makes sense.  Masakari already cleanly decouples
monitoring/alerting from automated recovery, so it could support this
quite nicely.  And the modular converged architecture we explained in
the presentation will maintain that clean separation of
responsibilities whilst integrating Masakari together with other
components such as Pacemaker, Mistral, and maybe Vitrage too.

For example whilst so far this thread has been about VM instance
monitoring, another area where Vitrage could integrate with Masakari
is compute host monitoring.

If you watch this part of our presentation where we explained the next
generation architecture, you'll see that we propose a new
"nova-host-alerter" component which has a driver-based mechanism for
alerting different services when a compute host experiences a failure:

    https://youtu.be/YPKE1guti8E?t=32m43s

So one obvious possibility would be to add a driver for Vitrage, so
that Vitrage can be alerted when Pacemaker spots a host failure.

Similarly, we could extend Pacemaker configurations to alert Vitrage
when individual processes such as nova-compute or libvirtd fail.

If you would like to discuss any of this further or have any more
questions, in addition to this mailing list we are also available to
talk on the #openstack-ha IRC channel!

Cheers,
Adam

P.S. I've added the [HA] badge to this thread since this discussion is
definitely related to high availability.



More information about the OpenStack-dev mailing list