Open Stack

Tue May 30 07:54:26 UTC 2017

Hi Yujun,

You started an interesting discussion. I think that the distinction between an operational error and a programmer error is correct and we should always keep that in mind.

I agree that having an overall design for error handling in Vitrage is a good idea; but I disagree that until then we better let it crash.

I think that Vitrage is made out of many pieces that don’t necessarily depend on one another. For example, if one datasource fails, everything else can work as usual – so why crash? Similarly, if one template fails to load, all other templates can still be activated.
Another aspect is that the main purpose of Vitrage is to provide insights. In case of a failure in one datasource/template, some of the insights might be missing. But this will not lead to inaccurate behavior or to wrong actions being executed in the system. IMO, we should give the user as much information as possible given that we have only part of the input.

Regarding the use cases that you mentioned:

  1.  invalid configuration file
[Ifat] This should depend on the specific configuration. If keystone is misconfigured, nothing will work of course. But if for example Zabbix is misconfigured, Vitrage should work and show the topology and the non-Zabbix alarms.

  1.  failed to communicate with data source
[Ifat] I think that the error should be logged, and all other datasources should work as usual.

  1.  malformed data from data source

[Ifat] I think that the error should be logged, and all other datasources should work as usual. This problem means we must modify the code in the datasource itself, but until then Vitrage should work, right?

  1.  failed to execute an action
[Ifat] Again, that’s a problem that requires code changes; but why fail other actions?

  1.  ...

BTW, it might be a good idea to add API/UI for showing the configuration and the status of the datasources. We all know that errors in the log files are often ignored…

Best Regards,
Ifat.

From: "Yujun Zhang (ZTE)" <zhangyujun+zte at gmail.com>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Date: Monday, 29 May 2017 at 16:13
To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Subject: [openstack-dev] [vitrage] error handling

Brought up by a recent code review, I think it worth a thorough discussion about the error handling rule.

I once read an article[1] from Joyent and it impressed me on the distinguish between Operational errors vs. programmer errors. The article is written for nodejs, but the principle also applies for other programming language.

The basic rule recommended by Joyent is
Handling operational errors
(Not) handling programmer errors
There is also one rule in openstack style guide line[2] close to this idea.

[H201] Do not write except:, use except Exception: at the very least. When catching an exception you should be as specific so you don’t mistakenly catch unexpected exceptions.

I do think before we have a well designed error handling, it is better to let it crash. It is dangerous to hide the errors and keep the system running in undetermined states.

So the question is what kind of operational errors are we facing in vitrage? I can think of something like

  1.  invalid configuration file
  2.  failed to communicate with data source
  3.  malformed data from data source
  4.  failed to execute an action
  5.  ...
Maybe this could be the first step for the error handling design.

[1]: https://www.joyent.com/node-js/production/design/errors
[2]: https://docs.openstack.org/developer/hacking/

--
Yujun Zhang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170530/18e90824/attachment.html>

Open Stack

[openstack-dev] [vitrage] error handling

OpenStack

Community

Documentation

Branding & Legal