[openstack-dev] [all] gate debugging

Joe Gordon joe.gordon0 at gmail.com
Thu Aug 28 18:18:56 UTC 2014


On Thu, Aug 28, 2014 at 5:17 AM, Thierry Carrez <thierry at openstack.org>
wrote:

> David Kranz wrote:
> > On 08/27/2014 03:43 PM, Sean Dague wrote:
> >> On 08/27/2014 03:33 PM, David Kranz wrote:
> >>> Race conditions are what makes debugging very hard. I think we are in
> >>> the process of experimenting with such an idea: asymetric gating by
> >>> moving functional tests to projects, making them deeper and more
> >>> extensive, and gating against their own projects. The result should be
> >>> that when a code change is made, we will spend much more time running
> >>> tests of code that is most likely to be growing a race bug from the
> >>> change. Of course there is a risk that we will impair integration
> >>> testing and we will have to be vigilant about that. One mitigating
> >>> factor is that if cross-project interaction uses apis (official or not)
> >>> that are well tested by the functional tests, there is less risk that a
> >>> bug will only show up only when those apis are used by another project.
> >>
> >> So, sorry, this is really not about systemic changes (we're running
> >> those in parallel), but more about skills transfer in people getting
> >> engaged. Because we need both. I guess that's the danger of breaking the
> >> thread is apparently I lost part of the context.
> >>
> > I agree we need both. I made the comment because if we can make gate
> > debugging less daunting
> > then less skill will be needed and I think that is key. Honestly, I am
> > not sure the full skill you have can be transferred. It was gained
> > partly through learning in simpler times.
>
> I think we could develop tools and visualizations that would help the
> debugging tasks. We could make those tasks more visible, and therefore
> more appealing to the brave souls that step up to tackle them. Sean and
> Joe did a ton of work improving the raw data, deriving graphs from it,
> highlighting log syntax or adding helpful Apache footers. But those days
> they spend so much time fixing the issues themselves, they can't
> continue on improving those tools.
>

Some tooling improvements I would like to do but probably don't have the
time for:

* Add the ability to filter http://status.openstack.org/elastic-recheck/ by
project. So a neutron dev can see the list of bugs that are neutron related
* Make the list of open reviews on
http://status.openstack.org/elastic-recheck/ easier to find
* Create an up to date diagram of what OpenStack looks like when running,
how services interact etc.
http://docs.openstack.org/training-guides/content/figures/5/figures/image31.jpg
 and
http://docs.openstack.org/admin-guide-cloud/content/figures/2/figures/openstack-arch-havana-logical-v1.jpg
are out of date
* Make http://jogo.github.io/gate easier to understand. This is what I
check to see the health of the gate.
* Build a request-id tracker for logs. Make it easier to find the logs for
a given request-id across multiple  services (nova-api,nova-scheduler etc.)


>
> And that's part of where the gate burnout comes from: spending so much
> time on the issues themselves that you can no longer work on preventing
> them from happening, or making the job of handling the issues easier, or
> documenting/mentoring other people so that they can do it in your place.
>
> --
> Thierry Carrez (ttx)
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140828/6d460f9e/attachment.html>


More information about the OpenStack-dev mailing list