[openstack-dev] [glance][all] Help with interpreting the log level guidelines

Kuvaja, Erno kuvaja at hp.com
Tue Sep 16 16:53:03 UTC 2014


> -----Original Message-----
> From: Sean Dague [mailto:sean at dague.net]
> Sent: 16 September 2014 17:31
> To: openstack-dev at lists.openstack.org
> Subject: Re: [openstack-dev] [glance][all] Help with interpreting the log level
> guidelines
> 
> On 09/16/2014 12:07 PM, Kuvaja, Erno wrote:
> >> -----Original Message-----
> >> From: Sean Dague [mailto:sean at dague.net]
> >> Sent: 16 September 2014 15:56
> >> To: openstack-dev at lists.openstack.org
> >> Subject: Re: [openstack-dev] [glance][all] Help with interpreting the
> >> log level guidelines
> >>
> >> On 09/16/2014 10:16 AM, Kuvaja, Erno wrote:
> >>>> -----Original Message-----
> >>>> From: Sean Dague [mailto:sean at dague.net]
> >>>> Sent: 16 September 2014 12:40
> >>>> To: openstack-dev at lists.openstack.org
> >>>> Subject: Re: [openstack-dev] [glance][all] Help with interpreting
> >>>> the log level guidelines
> >>>>
> >>>> On 09/16/2014 06:44 AM, Kuvaja, Erno wrote:
> >>>>>> -----Original Message-----
> >>>>>> From: Flavio Percoco [mailto:flavio at redhat.com]
> >>>>>> Sent: 16 September 2014 10:08
> >>>>>> To: openstack-dev at lists.openstack.org
> >>>>>> Subject: Re: [openstack-dev] [glance][all] Help with interpreting
> >>>>>> the log level guidelines
> >>>>>>
> >>>>>> On 09/16/2014 01:10 AM, Clint Byrum wrote:
> >>>>>>> Excerpts from Sean Dague's message of 2014-09-15 16:02:04 -0700:
> >>>>>>>> On 09/15/2014 07:00 PM, Mark Washenberger wrote:
> >>>>>>>>> Hi there logging experts,
> >>>>>>>>>
> >>>>>>>>> We've recently had a little disagreement in the glance team
> >>>>>>>>> about the appropriate log levels for http requests that end up
> >>>>>>>>> failing due to user errors. An example would be a request to
> >>>>>>>>> get an image that does not exist, which results in a 404 Not
> >>>>>>>>> Found
> >> request.
> >>>>>>>>>
> >>>>>>>>> On one hand, this event is an error, so DEBUG or INFO seem a
> >>>>>>>>> little too low. On the other hand, this error doesn't
> >>>>>>>>> generally require any kind of operator investigation or
> >>>>>>>>> indicate any actual failure of the service, so perhaps it is
> >>>>>>>>> excessive to log it at WARN or
> >>>> ERROR.
> >>>>>>>>>
> >>>>>>>>> Please provide feedback to help us resolve this dispute if you
> >>>>>>>>> feel you
> >>>>>> can!
> >>>>>>>>
> >>>>>>>> My feeling is this is an INFO level. There is really nothing
> >>>>>>>> the admin should care about here.
> >>>>>>>
> >>>>>>> Agree with Sean. INFO are useful for investigations. WARN and
> >>>>>>> ERROR are cause for alarm.
> >>>>>>
> >>>>>> +1 this is what we do in Zaqar as well.
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> @flaper87
> >>>>>> Flavio Percoco
> >>>>>>
> >>>>>
> >>>>> I think the debate here does not only limit to 404s. By the
> >>>>> logging guidelines
> >>>> INFO level messages should not contain any error related messages
> >>>> but rather stuff like certain components starting/stopping, config info,
> etc.
> >>>> WARN should not be anything that gets the ops pulled out of bed and
> >>>> so
> >> on.
> >>>>>
> >>>>> Also all information that would be in interest of ops should be
> >>>>> logged
> >>>> INFO+.
> >>>>>
> >>>>> Now if we are logging user errors as WARN that makes the
> >>>>> environment
> >>>> supportable even if the logging has been set as high as WARN
> >>>> cleaning the output a lot (as INFO shouldn't contain anything out
> >>>> of order
> >> anyways).
> >>>> Current situation is that logging at DEBUG level is the only option
> >>>> to get the needed information to actually run the services and get
> >>>> the data needed to support it as well. If we log user errors on
> >>>> INFO we get one step higher but we still have all that clutter like
> >>>> every single request in the logs and if that's the direction we
> >>>> want to go, we
> >> should revisit our logging guidelines as well.
> >>>>>
> >>>>> Thus my two euro cents goes towards WARN rather than debug and
> >>>> definitely not INFO.
> >>>>
> >>>> Part of it is how often you expect things to happen as well.
> >>>> Remember glanceclient opperates in the context of "other"
> >>>> processes. When it hits a 404 in Glance, it's not running in the
> >>>> glance context, it's running in the Nova context. Which means it
> >>>> needs to think of itself in
> >> that context.
> >>>>
> >>>> In that context we got the exception back from Glance, we know the
> >>>> image wasn't there. And we know whether or not that's a problem
> >>>> (glanceclient actually has no idea if it's a problem or not, we
> >>>> might be checking to make sure a thing isn't there, and success for us is
> the 404).
> >>>>
> >>>> So actually, I'm back to Jay on this, it should be DEBUG. Nova (or
> >>>> whoever the caller is) can decide if the issue warents something
> >>>> larger
> >> than that.
> >>>>
> >>>> This is really the biggest issue with logging in the clients,
> >>>> people don't think about the context that they are running in.
> >>>>
> >>>> 	-Sean
> >>>>
> >>>> --
> >>>> Sean Dague
> >>>> http://dague.net
> >>>>
> >>>
> >>> Sean,
> >>>
> >>> I'm not sure if we were specific enough here. Not talking about
> >>> client but
> >> the server logging. So how we should log events like client trying to
> >> change protected properties, access non existing image, create
> >> duplicate image IDs, etc.
> >>>
> >>> So for example if Nova is trying to access image that does not exist
> >>> should
> >> we ignore it on Glance side or when the user tries to do something
> >> that does not succeed. In my point of view it makes life much easier
> >> if we have information where the request failed rather than just a
> >> wsgi return code or having to run the system on DEBUG logging to get that
> information.
> >>
> >> Glance client throws an ERROR on 404 from Glance server -
> >> http://logs.openstack.org/81/120781/4/check/check-tempest-dsvm-
> >> full/90cb640/logs/screen-n-api.txt.gz?level=ERROR
> >>
> >> Glance server does not -
> >> http://logs.openstack.org/81/120781/4/check/check-tempest-dsvm-
> >> full/90cb640/logs/screen-g-api.txt.gz?level=ERROR
> >>
> >> Which is why I assumed this was where the conversation started.
> >>
> >> On the server side this should be an INFO with 404 recorded as the
> >> wsgi request.
> >>
> >> On the client this should be debug.
> >>
> >> Just to get a look at how completely confusing the current scenario
> >> is, lets dive through the logs about what should be changed.
> >>
> >> Inside glance you also need to address client vs. server. For
> >> instance -
> >> http://logs.openstack.org/81/120781/4/check/check-tempest-dsvm-
> >> full/90cb640/logs/screen-g-api.txt.gz?level=INFO#_2014-09-
> >> 15_19_42_25_137
> >> is incredibly confusing that it's an INFO, because that's a client call.
> >> The next line is the actual wsgi request you care about.
> >>
> >> swiftclient is equally confusing with -
> >> http://logs.openstack.org/81/120781/4/check/check-tempest-dsvm-
> >> full/90cb640/logs/screen-g-api.txt.gz?level=INFO#_2014-09-
> >> 15_19_42_27_206
> >>
> >> It should not be logging an ERROR there.
> >>
> >> Also whatever keeps dumping this:
> >>
> >> 127.0.0.1 - - [15/Sep/2014 19:42:38] code 400, message Bad request
> >> syntax
> >> ('0')
> >> 127.0.0.1 - - [15/Sep/2014 19:42:38] "0" 400 -
> >>
> >> Needs to be addressed.
> >>
> >> urllib3.connectionpool needs to be set to WARN level cutoff, the INFO
> >> messages cloud everything.
> >>
> >> keystone client shouldn't log at info -
> >> http://logs.openstack.org/81/120781/4/check/check-tempest-dsvm-
> >> full/90cb640/logs/screen-g-api.txt.gz?level=INFO#_2014-09-
> >> 15_19_47_27_833
> >>
> >> Knowing on every keystone client request which token version is being
> >> used is not INFO material.
> >>
> >> urllib3.connectionpool logging issue again -
> >> http://logs.openstack.org/81/120781/4/check/check-tempest-dsvm-
> >> full/90cb640/logs/screen-g-api.txt.gz?level=INFO#_2014-09-
> >> 15_19_47_52_000
> >>
> >> And if this WARNING isn't something that's addressable, it should be
> >> killed off -
> >> http://logs.openstack.org/81/120781/4/check/check-tempest-dsvm-
> >> full/90cb640/logs/screen-g-api.txt.gz?level=INFO#_2014-09-
> >> 15_19_50_59_792
> >>
> >> 	-Sean
> >>
> >> --
> >> Sean Dague
> >> http://dague.net
> >>
> >
> > I think the discussion started from
> > https://review.openstack.org/#/c/116626/8/glance/api/v1/images.py
> L#964- While trying  to update the image we get thrown 404 and before re
> raising it we used to log it as INFO and proposed to WARN (as per guidelines).
> Now this issue might be user initiated (typo on the image ID etc.) or
> something else. In my understanding based on the Logging Standards
> https://wiki.openstack.org/wiki/LoggingStandards "Info: Usually indicates
> successful service start/stop, versions and such non-error related data"
> (under Debug - Log Level Definitions). Also the referred discussion gives
> direction I can easily relate to
> http://stackoverflow.com/questions/2031163/when-to-use-log-level-warn-
> vs-error in the comments pointed out that INFO and WARN could be merged
> but it's easier to spot out the issues when INFO is kept on basic state changes
> etc. and WARN used for potential issues.
> >
> > What comes to the Glance Registry client logging 404 as INFO I do not see
> what makes it confusing (although based on above it should not be INFO but
> rather WARN). The actual message is clear. We got 404 from registry so we
> return (the following line) 404 to the client. That line there being DEBUG and
> we're again in the point where you must run the system on DEBUG level
> logging or you have no idea where to look at for the issue unless you know
> the internals well.
> 
> It should not be a WARN. Really, it should not be a WARN. There is nothing
> WARN about a 404, it's just a fact of life.
> 
> The thing that makes it confusing is unless you know all the glance code you
> think those 2 lines are different operations. But they are the same operation.
> Logging the same logical thing from 2 places.
> 
> "A man with a watch knows what time it is. A man with two watches is never
> sure." - http://en.wikipedia.org/wiki/Segal's_law
> 
> Logging the same information twice in slightly different ways just generates
> confusion about what's going on.
> 
> > What worries me way more is that swiftclient actually logs the trace of 404
> call, not so much that there is error, which could be somewhat justifiable:
> > http://logs.openstack.org/81/120781/4/check/check-tempest-dsvm-full/90
> > cb640/logs/screen-g-api.txt.gz?level=INFO#_2014-09-15_19_42_27_206
> >
> > Thanks for broadening the discussion a bit out of Glance. I think this is topic
> that should be discussed on wider scope. I think if nothing else we need to
> look into the Logging Standards regarding the pointers raised here.
> 
> Right, the big issue is that the logging conversation can't happen just in the
> code, you need to look at real output, which is typically the aggregate effect
> of dozens of modules directly writing out log messages.
> It's actually a hugh architectural issue of python logging for making server
> software that feels even moderately coherent.
> 
> 	-Sean
> 
> --
> Sean Dague
> http://dague.net
> 

That I do agree and as the logs are mess for us who look the code base more or less daily basis, can just imagine what it is for the poor cloud operator.

Now the big question is what we are going to do about this.

- Erno



More information about the OpenStack-dev mailing list