[openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
cboylan at sapwetik.org
Tue Mar 22 00:09:34 UTC 2016
On Mon, Mar 21, 2016, at 01:23 PM, Sean Dague wrote:
> On 03/21/2016 04:09 PM, Clark Boylan wrote:
> > On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote:
> >> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote:
> >>> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote:
> >>>> Do you have an a better insight of job runtimes vs jobs in other
> >>>> projects?
> >>>> Most of the time in the job runtime is actually spent setting the
> >>>> infrastructure up, and I am not sure we can do anything about it, unless
> >>>> we
> >>>> take this with Infra.
> >>> I haven't done a comparison yet buts lets break down the runtime of a
> >>> recent successful neutron full run against neutron master .
> >> And now for some comparative data from the gate-tempest-dsvm-full job
> >> . This job also ran against a master change that merged and ran in
> >> the same cloud and region as the neutron job.
> > snip
> >> Generally each step of this job was quicker. There were big differences
> >> in devstack and tempest run time though. Is devstack much slower to
> >> setup neutron when compared to nova net? For tempest it looks like we
> >> run ~1510 tests against neutron and only ~1269 against nova net. This
> >> may account for the large difference there. I also recall that we run
> >> ipv6 tempest tests against neutron deployments that were inefficient and
> >> booted 2 qemu VMs per test (not sure if that is still the case but
> >> illustrates that the tests themselves may not be very quick in the
> >> neutron case).
> > Looking at the tempest slowest tests output for each of these jobs
> > (neutron and nova net) some tests line up really well across jobs and
> > others do not. In order to get a better handle on the runtime for
> > individual tests I have pushed https://review.openstack.org/295487 which
> > will run tempest serially reducing the competition for resources between
> > tests.
> > Hopefully the subunit logs generated by this change can provide more
> > insight into where we are losing time during the tempest test runs.
The results are in, we have gate-tempest-dsvm-full  and
gate-tempest-dsvm-neutron-full  job results where tempest ran
serially to reduce resource contention and provide accurateish per test
timing data. Both of these jobs ran on the same cloud so should have
comparable performance from the underlying VMs.
Time spent in job before tempest: 700 seconds
Time spent running tempest: 2428
Tempest tests run: 1269 (113 skipped)
Time spent in job before tempest: 789 seconds
Time spent running tempest: 4407 seconds
Tempest tests run: 1510 (76 skipped)
All times above are wall time as recorded by Jenkins.
We can also compare the 10 slowest tests in the non neutron job against
their runtimes in the neutron job. (note this isn't a list of the top 10
slowest tests in the neutron job because that job runs extra tests).
nova net job
> Subunit logs aren't the full story here. Activity in addCleanup doesn't
> get added to the subunit time accounting for the test, which causes some
> interesting issues when waiting for resources to delete. I would be
> especially cautious of that on some of these.
Based on this those numbers above may not tell the whole story but they
do seem to tell us that in comparable circumstances neutron is slower
than nova net. Now the sample size is tiny, but again it gives us
somewhere to start. What is boot from volume doing in the neutron case
that makes it so much slower? Why is shelving so much slower with
neutron? and so on.
A few seconds here and a few seconds there adds up when these operations
are repeated a few hundred times. We can probably start to whittle the
job runtime down by shaving that extra time off. In any case I think
this is about as far as I can pull this thread, and will let the neutron
team take it from here.
More information about the OpenStack-dev