[openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

Clark Boylan cboylan at sapwetik.org
Tue Mar 22 00:09:34 UTC 2016


On Mon, Mar 21, 2016, at 01:23 PM, Sean Dague wrote:
> On 03/21/2016 04:09 PM, Clark Boylan wrote:
> > On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote:
> >> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote:
> >>> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote: 
> >>>> Do you have an a better insight of job runtimes vs jobs in other
> >>>> projects?
> >>>> Most of the time in the job runtime is actually spent setting the
> >>>> infrastructure up, and I am not sure we can do anything about it, unless
> >>>> we
> >>>> take this with Infra.
> >>>
> >>> I haven't done a comparison yet buts lets break down the runtime of a
> >>> recent successful neutron full run against neutron master [0].
> >>
> >> And now for some comparative data from the gate-tempest-dsvm-full job
> >> [0]. This job also ran against a master change that merged and ran in
> >> the same cloud and region as the neutron job.
> >>
> > snip
> >> Generally each step of this job was quicker. There were big differences
> >> in devstack and tempest run time though. Is devstack much slower to
> >> setup neutron when compared to nova net? For tempest it looks like we
> >> run ~1510 tests against neutron and only ~1269 against nova net. This
> >> may account for the large difference there. I also recall that we run
> >> ipv6 tempest tests against neutron deployments that were inefficient and
> >> booted 2 qemu VMs per test (not sure if that is still the case but
> >> illustrates that the tests themselves may not be very quick in the
> >> neutron case).
> > 
> > Looking at the tempest slowest tests output for each of these jobs
> > (neutron and nova net) some tests line up really well across jobs and
> > others do not. In order to get a better handle on the runtime for
> > individual tests I have pushed https://review.openstack.org/295487 which
> > will run tempest serially reducing the competition for resources between
> > tests.
> > 
> > Hopefully the subunit logs generated by this change can provide more
> > insight into where we are losing time during the tempest test runs.

The results are in, we have gate-tempest-dsvm-full [0] and
gate-tempest-dsvm-neutron-full [1] job results where tempest ran
serially to reduce resource contention and provide accurateish per test
timing data. Both of these jobs ran on the same cloud so should have
comparable performance from the underlying VMs.

gate-tempest-dsvm-full
Time spent in job before tempest: 700 seconds
Time spent running tempest: 2428
Tempest tests run: 1269 (113 skipped)

gate-tempest-dsvm-neutron-full
Time spent in job before tempest: 789 seconds
Time spent running tempest: 4407 seconds
Tempest tests run: 1510 (76 skipped)

All times above are wall time as recorded by Jenkins.

We can also compare the 10 slowest tests in the non neutron job against
their runtimes in the neutron job. (note this isn't a list of the top 10
slowest tests in the neutron job because that job runs extra tests).

nova net job
tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern
                                  85.232
tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern
                                    83.319
tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance
                                 50.338
tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern
                                            43.494
tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario
                                    40.225
tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
                                               39.653
tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV1Test.test_volume_backup_create_get_detailed_list_restore_delete
37.720
tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV2Test.test_volume_backup_create_get_detailed_list_restore_delete
36.355
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_confirm_from_stopped
               27.375
tempest.scenario.test_encrypted_cinder_volumes.TestEncryptedCinderVolumes.test_encrypted_cinder_volumes_luks
                27.025

neutron job
tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern
                                 110.345
tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern
                                   108.170
tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance
                                 63.852
tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
                                               59.931
tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern
                                            57.835
tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario
                                    49.552
tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV1Test.test_volume_backup_create_get_detailed_list_restore_delete
40.378
tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV2Test.test_volume_backup_create_get_detailed_list_restore_delete
39.088
tempest.scenario.test_encrypted_cinder_volumes.TestEncryptedCinderVolumes.test_encrypted_cinder_volumes_luks
                35.645
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_confirm_from_stopped
               30.551

> Subunit logs aren't the full story here. Activity in addCleanup doesn't
> get added to the subunit time accounting for the test, which causes some
> interesting issues when waiting for resources to delete. I would be
> especially cautious of that on some of these.

Based on this those numbers above may not tell the whole story but they
do seem to tell us that in comparable circumstances neutron is slower
than nova net. Now the sample size is tiny, but again it gives us
somewhere to start. What is boot from volume doing in the neutron case
that makes it so much slower? Why is shelving so much slower with
neutron? and so on.

A few seconds here and a few seconds there adds up when these operations
are repeated a few hundred times. We can probably start to whittle the
job runtime down by shaving that extra time off. In any case I think
this is about as far as I can pull this thread, and will let the neutron
team take it from here.

[0]
http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-full/8e64615/console.html
[1]
http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/console.html



More information about the OpenStack-dev mailing list