[openstack-dev] [masakari] [masakari-monitors] : Intrusive Instance Monitoring through QEMU Guest Agent Design Update

Sam P sam47priya at gmail.com
Tue Feb 20 08:33:10 UTC 2018


​Hi Louie,
 Thank you for patch and Sorry for the delay​ response.
I prefer ​option 2.
>From Masakari point of view, this is an instance event. Because, even if
some thing
wrong inside the VM, Masakari only can try to fix it by restart, rebuilt,
migrate... etc the VM.
Which are the same recovery work flow for instance failures.  Therefore, I
prefer option 2
rather than option1.
 Currently, we are discussing how to implement recovery method
customization feature [0] in
Masakari. With this feature, you may able to call external workflows for
certain failure events.
For this feature, different failure models required distinguishable events
and option 3 will not
be appropriate.

[0] https://review.openstack.org/#/c/458023/


​> 1. define a new type of event for Intrusive Instance monitoring or
> 2. add a new event within the INSTANCE_EVENTS as we may  eventually
integrate with instance monitoring  or
>3.simply reuse the LIFECYCLE/STOPPED_FAILED event ( which is what we are
implementing for now.)

--- Regards,
Sampath


On Fri, Feb 16, 2018 at 12:05 AM, Kwan, Louie <Louie.Kwan at windriver.com>
wrote:

> We submitted the first implementation patch for the following blueprint
>
>
>
> https://blueprints.launchpad.net/openstack/?searchtext=
> intrusive-instance-monitoring
>
>
>
> i.e. https://review.openstack.org/#/c/534958/
>
>
>
> The second patch  will be pushed within a week time or so.
>
>
>
> One item we would like to seek clarification among the community is about
>  how we should integrate the notification within the masakari engine.
>
>
>
> One option is to reuse what has been defined at  masakari/engine/instance_
> events.py.
>
>
>
> e.g.
>
>     def masakari_notifier(self, domain_uuid):
>
>         if self.getJournalObject(domain_uuid).getSentNotification():
>
>             LOG.debug('notifier.send_notification Skipped:' + domain_uuid)
>
>         else:
>
>             hostname = socket.gethostname()
>
>             noticeType = ec.EventConstants.TYPE_VM
>
>             current_time = timeutils.utcnow()
>
>             event = {
>
>                 'notification': {
>
>                     'type': noticeType,
>
>                     'hostname': hostname,
>
>                     'generated_time': current_time,
>
>                     'payload': {
>
>                         'event': 'LIFECYCLE',
>
>                         'instance_uuid': domain_uuid,
>
>                         'vir_domain_event': 'STOPPED_FAILED'
>
>                     }
>
>                 }
>
>             }
>
>             LOG.debug(str(event))
>
>             self.notifier.send_notification(CONF.callback.retry_max,
>
>                                         CONF.callback.retry_interval,
>
>                                         event)
>
>             self.getJournalObject(domain_uuid).setSentNotification(True)
>
>
>
>
>
> ​​
> Should we
>
>
>
> 1.       define a new type of event for Intrusive Instance monitoring or
>
> 2.       add a new event within the INSTANCE_EVENTS as we may  eventually
> integrate with instance monitoring  or
>
> 3.       simply reuse the LIFECYCLE/STOPPED_FAILED event ( which is what
> we are implementing for now.)
>
>
>
> One of our reference test case is to detect application meltdown within VM
> which QEMU may not  aware the failure. The recovery should pretty much be
> the same as LIFECYCLE/STOPPED_FAILED event. What do you think?
>
>
>
> Thanks.
>
> Louie
>
>
>
> Ntoe:
>
>
>
> Here is what we got from masakari/engine/instance_events.py
>
>
>
> These are the events which needs to be processed by masakari in case of
>
> instance recovery failure.
>
> """
>
>
>
> INSTANCE_EVENTS = {
>
>     # Add more events and vir_domain_events here.
>
>     'LIFECYCLE': ['STOPPED_FAILED'],
>
>     'IO_ERROR': ['IO_ERROR_REPORT']
>
> }
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180220/a502db24/attachment.html>


More information about the OpenStack-dev mailing list