[openstack-dev] [ceilometer] [qa] Ceilometer ERRORS in normal runs

Rochelle.Grober Rochelle.Grober at huawei.com
Wed Oct 23 21:08:51 UTC 2013



John Griffith wrote:
On Wed, Oct 23, 2013 at 8:47 AM, Sean Dague <sean at dague.net<mailto:sean at dague.net>> wrote:
On 10/23/2013 10:40 AM, John Griffith wrote:



On Sun, Oct 20, 2013 at 7:38 AM, Sean Dague <sean at dague.net<mailto:sean at dague.net>
<mailto:sean at dague.net<mailto:sean at dague.net>>> wrote:

    Dave Kranz has been building a system so that we can ensure that
    during a Tempest run services don't spew ERRORs in the logs.
    Eventually, we're going to gate on this, because there is nothing
    that Tempest does to the system that should cause any OpenStack
    service to ERROR or stack trace (Errors should actually be
    exceptional events that something is wrong with the system, not
    regular events).


So I have to disagree with the approach being taken here.  Particularly
in the case of Cinder and the negative tests that are in place.  When I
read this last week I assumed you actually meant that "Exceptions" were
exceptional and nothing in Tempest should cause Exceptions.  It turns
out you apparently did mean Errors.  I completely disagree here, Errors
happen, some are recovered, some are expected by the tests etc.  Having
a policy and especially a gate that says NO ERROR MESSAGE in logs makes
absolutely no sense to me.

Something like NO TRACE/EXCEPTION MESSAGE in logs I can agree with, but
this makes no sense to me.  By the way, here's a perfect example:
https://bugs.launchpad.net/cinder/+bug/1243485

As long as we have Tempest tests that do things like "show non-existent
volume" you're going to get an Error message and I think that you should
quite frankly.

Ok, I guess that's where we probably need to clarify what "Not Found" is. Because "Not Found" to me seems like it should be a request at INFO level, not ERROR.


ERROR from an admin perspective should really be something that would suitable for sending an alert to an administrator for them to come and fix the cloud.

>From my perspective as someone who has done Ops in the past, a "Volume Not Found" can be either info or an error.  It all depends on the context.  That said, we need to be able to test ERROR conditions and ensure that they report properly as ERROR, else the poor Ops folks will always be on the spot for not knowing that there is a problem.  A volume that has gone missing is a problem.  Ops would like an immediate report.  They would trigger on the ERROR statement in the log.  On the other hand, if someone/thing  fatfingers an input and requests something that has never existed, then that's just info.

We need to be able to test for correctness of errors and process logs with errors in them as part of the test verification.  Perhaps a switch in the test that indicates log needs post processing, or a way to redirect the log during a specific error test, or some such?  The question is, how do we keep test system logs clean of ERRORs and still test system logs for intentionally triggered ERRORs?

--Rocky


TRACE is actually a lower level of severity in our log systems than ERROR is.

Sorry, by Trace I was referring to unhandled stack/exception trace messages in the logs.


        -Sean

--
Sean Dague
http://dague.net


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131023/8d0a0222/attachment.html>


More information about the OpenStack-dev mailing list