[openstack-dev] [nova] readout from Philly Operators Meetup
Sean Dague
sean at dague.net
Wed Mar 11 21:01:40 UTC 2015
On 03/11/2015 01:21 PM, Tim Bell wrote:
>>>>> Reporting on Scheduler Fails ----------------------------
>>>>>
>>>>> Apparently, some time recently, we stopped logging scheduler
>>>>> fails above DEBUG, and that behavior also snuck back into
>>>>> Juno as well
>>>>> (https://etherpad.openstack.org/p/PHL-ops-nova-feedback L78).
>>>>> This has made tracking down root cause of failures far more
>>>>> difficult.
>>>>>
>>>>> Action: this should hopefully be a quick fix we can get in
>>>>> for Kilo and backport.
>>> It's unfortunate that failed scheduling attempts are providing
>>> only an INFO log. A quick fix could be at least to turn the
>>> verbosity up to WARN so it would be noticied more easily
>>> (including the whole filters stack with their results). That
>>> said, I'm pretty against any proposal which would expose those
>>> specific details (ie. the number of hosts which are succeeding
>>> per filter) in an API endpoint because it would also expose the
>>> underlying infrastructure capacity and would ease DoS
>>> discoveries. A workaround could be to include in the ERROR
>>> message only the name of the filter which has been denied so the
>>> operators could very easily match what the user is saying with
>>> what they're seeing in the scheduler logs.
>>>
>>> Does that work for people ? I can provide changes for both.
>>>
>>> -Sylvain
>>>
> In the CERN use case, we'd be OK providing more details to the end
> user. This would save on followup helpdesk tickets which could
> instead be documented (e.g. try a different availability zone or a
> smaller flavour). However, I fully understand that this is a private
> cloud oriented answer so it should be configurable.
>
> At minimum, providing the information as standard in the logs is
> needed. These scenarios are automatic helpdesk cases so giving the
> operator the information needed in the logs with the instance IDs
> saves the "I've turned on DEBUG, can you try again?".
>
> Tim
I think there is an interesting follow discussion (maybe in Vancouver)
about "self service debugging". There are a lot of assumptions of data
hiding which assume your users are untrustable. In some environments,
that's appropriate. However in many of the private cloud use cases the
users are quite trusted, and exposing more information on errors would
actually be good for everyone. It would close the loops on problem
determination.
Anyway, something to ponder longer term. Handling that is beyond the
scope of the regression, but it's an interesting idea that has at least
one big operator in favor of it.
-Sean
--
Sean Dague
http://dague.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 465 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150311/c5d65d41/attachment.pgp>
More information about the OpenStack-dev
mailing list