[openstack-dev] ?==?utf-8?q? Lots of slow tests timing out jobs

jean-philippe@evrard.me jean-philippe at evrard.me
Wed Jul 25 07:31:31 UTC 2018


On Wednesday, July 25, 2018 08:46 CEST, Ghanshyam Mann <gmann at ghanshyammann.com> wrote: 
 
>  ---- On Wed, 25 Jul 2018 05:15:53 +0900 Matt Riedemann <mriedemos at gmail.com> wrote ---- 
>  > While going through our uncategorized gate failures [1] I found that we 
>  > have a lot of jobs failing (161 in 7 days) due to the tempest run timing 
>  > out [2]. I originally thought it was just the networking scenario tests, 
>  > but I was able to identify a handful of API tests that are also taking 
>  > nearly 3 minutes each, which seems like they should be moved to scenario 
>  > tests and/or marked slow so they can be run in a dedicated tempest-slow job.
>  > 
>  > I'm not sure how to get the history on the longest-running tests on 
>  > average to determine where to start drilling down on the worst 
>  > offenders, but it seems like an audit is in order.
> 
> yeah, there are many tests taking too long time. I do not know the reason this time but last time we did audit for slow tests was mainly due to ssh failure. 
> I have created the similar ethercalc [3] to collect time taking tests and then round figure of their avg time taken since last 14 days from health dashboard. Yes, there is no calculated avg time on o-h so I did not take exact avg time its round figure. 
> 
> May be 14 days  is too less to take decision to mark them slow but i think their avg time since 3 months will be same. should we consider 3 month time period for those ?
> 
> As per avg time, I have voted (currently based on 14 days avg) on ethercalc which all test to mark as slow. I taken the criteria of >120 sec avg time.  Once we have more and more people votes there we can mark them slow. 
> 
> [3] https://ethercalc.openstack.org/dorupfz6s9qt
> 
> -gmann
> 

We have a similar observation in openstack-ansible. It is painful. Recently something that passed gates without rechecks (but close to timeout) took 14 (timeouts) rechecks to get in.

In OSA, we will be starting a project to refactor our testing for being faster, but I'd like to have news of your research :)

Thanks,
Jean-Philippe (evrardjp)

>  > 
>  > [1] http://status.openstack.org/elastic-recheck/data/integrated_gate.html
>  > [2] https://bugs.launchpad.net/tempest/+bug/1783405
>  > 
>  > -- 
>  > 
>  > Thanks,
>  > 
>  > Matt
>  > 
>  > __________________________________________________________________________
>  > OpenStack Development Mailing List (not for usage questions)
>  > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>  > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>  > 
> 
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list