[openstack-dev] [masakari] [masakari-monitors] : Intrusive Instance Monitoring through QEMU Guest Agent Design Update

Kwan, Louie Louie.Kwan at windriver.com
Thu Feb 15 15:05:45 UTC 2018


We submitted the first implementation patch for the following blueprint

https://blueprints.launchpad.net/openstack/?searchtext=intrusive-instance-monitoring

i.e. https://review.openstack.org/#/c/534958/

The second patch  will be pushed within a week time or so.

One item we would like to seek clarification among the community is about  how we should integrate the notification within the masakari engine.

One option is to reuse what has been defined at  masakari/engine/instance_events.py.

e.g.
    def masakari_notifier(self, domain_uuid):
        if self.getJournalObject(domain_uuid).getSentNotification():
            LOG.debug('notifier.send_notification Skipped:' + domain_uuid)
        else:
            hostname = socket.gethostname()
            noticeType = ec.EventConstants.TYPE_VM
            current_time = timeutils.utcnow()
            event = {
                'notification': {
                    'type': noticeType,
                    'hostname': hostname,
                    'generated_time': current_time,
                    'payload': {
                        'event': 'LIFECYCLE',
                        'instance_uuid': domain_uuid,
                        'vir_domain_event': 'STOPPED_FAILED'
                    }
                }
            }
            LOG.debug(str(event))
            self.notifier.send_notification(CONF.callback.retry_max,
                                        CONF.callback.retry_interval,
                                        event)
            self.getJournalObject(domain_uuid).setSentNotification(True)


Should we


1.       define a new type of event for Intrusive Instance monitoring or

2.       add a new event within the INSTANCE_EVENTS as we may  eventually integrate with instance monitoring  or

3.       simply reuse the LIFECYCLE/STOPPED_FAILED event ( which is what we are implementing for now.)

One of our reference test case is to detect application meltdown within VM which QEMU may not  aware the failure. The recovery should pretty much be the same as LIFECYCLE/STOPPED_FAILED event. What do you think?

Thanks.
Louie

Ntoe:

Here is what we got from masakari/engine/instance_events.py

These are the events which needs to be processed by masakari in case of
instance recovery failure.
"""

INSTANCE_EVENTS = {
    # Add more events and vir_domain_events here.
    'LIFECYCLE': ['STOPPED_FAILED'],
    'IO_ERROR': ['IO_ERROR_REPORT']
}

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180215/ab24b17e/attachment-0001.html>


More information about the OpenStack-dev mailing list