[openstack-dev] [all] gate debugging
Matthew Treinish
mtreinish at kortar.org
Wed Aug 27 22:32:46 UTC 2014
On Wed, Aug 27, 2014 at 05:47:09PM -0400, Doug Hellmann wrote:
>
> On Aug 27, 2014, at 5:27 PM, Doug Hellmann <doug at doughellmann.com> wrote:
>
> >
> > On Aug 27, 2014, at 2:54 PM, Sean Dague <sean at dague.net> wrote:
> >
> >> Note: thread intentionally broken, this is really a different topic.
> >>
> >> On 08/27/2014 02:30 PM, Doug Hellmann wrote:>
> >>> On Aug 27, 2014, at 1:30 PM, Chris Dent <chdent at redhat.com> wrote:
> >>>
> >>>> On Wed, 27 Aug 2014, Doug Hellmann wrote:
> >>>>
> >>>>> I have found it immensely helpful, for example, to have a written set
> >>>>> of the steps involved in creating a new library, from importing the
> >>>>> git repo all the way through to making it available to other projects.
> >>>>> Without those instructions, it would have been much harder to split up
> >>>>> the work. The team would have had to train each other by word of
> >>>>> mouth, and we would have had constant issues with inconsistent
> >>>>> approaches triggering different failures. The time we spent building
> >>>>> and verifying the instructions has paid off to the extent that we even
> >>>>> had one developer not on the core team handle a graduation for us.
> >>>>
> >>>> +many more for the relatively simple act of just writing stuff down
> >>>
> >>> "Write it down.” is my theme for Kilo.
> >>
> >> I definitely get the sentiment. "Write it down" is also hard when you
> >> are talking about things that do change around quite a bit. OpenStack as
> >> a whole sees 250 - 500 changes a week, so the interaction pattern moves
> >> around enough that it's really easy to have *very* stale information
> >> written down. Stale information is even more dangerous than no
> >> information some times, as it takes people down very wrong paths.
> >>
> >> I think we break down on communication when we get into a conversation
> >> of "I want to learn gate debugging" because I don't quite know what that
> >> means, or where the starting point of understanding is. So those
> >> intentions are well meaning, but tend to stall. The reality was there
> >> was no road map for those of us that dive in, it's just understanding
> >> how OpenStack holds together as a whole and where some of the high risk
> >> parts are. And a lot of that comes with days staring at code and logs
> >> until patterns emerge.
> >>
> >> Maybe if we can get smaller more targeted questions, we can help folks
> >> better? I'm personally a big fan of answering the targeted questions
> >> because then I also know that the time spent exposing that information
> >> was directly useful.
> >>
> >> I'm more than happy to mentor folks. But I just end up finding the "I
> >> want to learn" at the generic level something that's hard to grasp onto
> >> or figure out how we turn it into action. I'd love to hear more ideas
> >> from folks about ways we might do that better.
> >
> > You and a few others have developed an expertise in this important skill. I am so far away from that level of expertise that I don’t know the questions to ask. More often than not I start with the console log, find something that looks significant, spend an hour or so tracking it down, and then have someone tell me that it is a red herring and the issue is really some other thing that they figured out very quickly by looking at a file I never got to.
> >
> > I guess what I’m looking for is some help with the patterns. What made you think to look in one log file versus another? Some of these jobs save a zillion little files, which ones are actually useful? What tools are you using to correlate log entries across all of those files? Are you doing it by hand? Is logstash useful for that, or is that more useful for finding multiple occurrences of the same issue?
> >
> > I realize there’s not a way to write a how-to that will live forever. Maybe one way to deal with that is to write up the research done on bugs soon after they are solved, and publish that to the mailing list. Even the retrospective view is useful because we can all learn from it without having to live through it. The mailing list is a fairly ephemeral medium, and something very old in the archives is understood to have a good chance of being out of date so we don’t have to keep adding disclaimers.
>
> Matt’s blog post [1] is an example of the sort of thing I think would be helpful. Obviously one post isn’t going to make the reader an expert, but over time a few of these will impart some useful knowledge.
>
> Doug
>
> [1] http://blog.kortar.org/?p=52&draftsforfriends=cTT3WsXqsH66eEt6uoi9rQaL2vGc8Vde
So that was just an expiring link (which shouldn't be valid anymore) to the
draft which I generated to get some initial feedback before I posted it. The
permanent link to the post is here:
http://blog.kortar.org/?p=52
-Matt Treinish
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140827/dfa5cb9a/attachment.pgp>
More information about the OpenStack-dev
mailing list