[openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

Assaf Muller assaf at redhat.com
Tue Mar 22 01:15:10 UTC 2016


On Mon, Mar 21, 2016 at 8:09 PM, Clark Boylan <cboylan at sapwetik.org> wrote:
> On Mon, Mar 21, 2016, at 01:23 PM, Sean Dague wrote:
>> On 03/21/2016 04:09 PM, Clark Boylan wrote:
>> > On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote:
>> >> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote:
>> >>> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote:
>> >>>> Do you have an a better insight of job runtimes vs jobs in other
>> >>>> projects?
>> >>>> Most of the time in the job runtime is actually spent setting the
>> >>>> infrastructure up, and I am not sure we can do anything about it, unless
>> >>>> we
>> >>>> take this with Infra.
>> >>>
>> >>> I haven't done a comparison yet buts lets break down the runtime of a
>> >>> recent successful neutron full run against neutron master [0].
>> >>
>> >> And now for some comparative data from the gate-tempest-dsvm-full job
>> >> [0]. This job also ran against a master change that merged and ran in
>> >> the same cloud and region as the neutron job.
>> >>
>> > snip
>> >> Generally each step of this job was quicker. There were big differences
>> >> in devstack and tempest run time though. Is devstack much slower to
>> >> setup neutron when compared to nova net? For tempest it looks like we
>> >> run ~1510 tests against neutron and only ~1269 against nova net. This
>> >> may account for the large difference there. I also recall that we run
>> >> ipv6 tempest tests against neutron deployments that were inefficient and
>> >> booted 2 qemu VMs per test (not sure if that is still the case but
>> >> illustrates that the tests themselves may not be very quick in the
>> >> neutron case).
>> >
>> > Looking at the tempest slowest tests output for each of these jobs
>> > (neutron and nova net) some tests line up really well across jobs and
>> > others do not. In order to get a better handle on the runtime for
>> > individual tests I have pushed https://review.openstack.org/295487 which
>> > will run tempest serially reducing the competition for resources between
>> > tests.
>> >
>> > Hopefully the subunit logs generated by this change can provide more
>> > insight into where we are losing time during the tempest test runs.
>
> The results are in, we have gate-tempest-dsvm-full [0] and
> gate-tempest-dsvm-neutron-full [1] job results where tempest ran
> serially to reduce resource contention and provide accurateish per test
> timing data. Both of these jobs ran on the same cloud so should have
> comparable performance from the underlying VMs.
>
> gate-tempest-dsvm-full
> Time spent in job before tempest: 700 seconds
> Time spent running tempest: 2428
> Tempest tests run: 1269 (113 skipped)
>
> gate-tempest-dsvm-neutron-full
> Time spent in job before tempest: 789 seconds
> Time spent running tempest: 4407 seconds
> Tempest tests run: 1510 (76 skipped)
>
> All times above are wall time as recorded by Jenkins.
>
> We can also compare the 10 slowest tests in the non neutron job against
> their runtimes in the neutron job. (note this isn't a list of the top 10
> slowest tests in the neutron job because that job runs extra tests).
>
> nova net job
> tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern
>                                   85.232
> tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern
>                                     83.319
> tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance
>                                  50.338
> tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern
>                                             43.494
> tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario
>                                     40.225
> tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
>                                                39.653
> tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV1Test.test_volume_backup_create_get_detailed_list_restore_delete
> 37.720
> tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV2Test.test_volume_backup_create_get_detailed_list_restore_delete
> 36.355
> tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_confirm_from_stopped
>                27.375
> tempest.scenario.test_encrypted_cinder_volumes.TestEncryptedCinderVolumes.test_encrypted_cinder_volumes_luks
>                 27.025
>
> neutron job
> tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern
>                                  110.345
> tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern
>                                    108.170
> tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance
>                                  63.852
> tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
>                                                59.931
> tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern
>                                             57.835
> tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario
>                                     49.552
> tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV1Test.test_volume_backup_create_get_detailed_list_restore_delete
> 40.378
> tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV2Test.test_volume_backup_create_get_detailed_list_restore_delete
> 39.088
> tempest.scenario.test_encrypted_cinder_volumes.TestEncryptedCinderVolumes.test_encrypted_cinder_volumes_luks
>                 35.645
> tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_confirm_from_stopped
>                30.551
>
>> Subunit logs aren't the full story here. Activity in addCleanup doesn't
>> get added to the subunit time accounting for the test, which causes some
>> interesting issues when waiting for resources to delete. I would be
>> especially cautious of that on some of these.
>
> Based on this those numbers above may not tell the whole story but they
> do seem to tell us that in comparable circumstances neutron is slower
> than nova net. Now the sample size is tiny, but again it gives us
> somewhere to start. What is boot from volume doing in the neutron case
> that makes it so much slower? Why is shelving so much slower with
> neutron? and so on.
>
> A few seconds here and a few seconds there adds up when these operations
> are repeated a few hundred times. We can probably start to whittle the
> job runtime down by shaving that extra time off. In any case I think
> this is about as far as I can pull this thread, and will let the neutron
> team take it from here.

If what we want is to cut down execution time I'd suggest to stop
running Cinder tests on Neutron patches (Call it as an experiment) and
see how long it takes for a regression to slip in. Being an
optimistic, I would guess: Never!

If we're running these tests on Neutron patches solely as a data point
for performance testing, Tempest is obviously not the tool for the job
and doesn't provide any added value we can't get from Rally and
profilers for example. If there's otherwise value for running Cinder
(And other tests that don't exercise the Neutron API), I'd love to
know what it is :) I cannot remember any legit Cinder failure on
Neutron patches.

>
> [0]
> http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-full/8e64615/console.html
> [1]
> http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/console.html
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list