[openstack-dev] [ironic] 3rdparty CI status and how we can help to make it green

Dmitry Tantsur dtantsur at redhat.com
Thu Apr 13 15:33:05 UTC 2017


Hi all, especially maintainers of 3rdparty CI for Ironic :)

I've been watching our 3rdparty CI results recently. While things have improved 
compared to e.g. a month ago, most of jobs still finish with failures. I've 
written a simple script [1] to fetch CI runs information from my local Gertty 
database, the results [2] show that some jobs still fail surprisingly often (> 
50% of cases):

- job: tempest-dsvm-ironic-agent-irmc
   rate: 0.9857142857142858
- job: tempest-dsvm-ironic-iscsi-irmc
   rate: 0.9771428571428571
- job: dell-hw-tempest-dsvm-ironic-pxe_drac
   rate: 0.9682539682539683
- job: gate-tempest-ironic-ilo-driver-iscsi_ilo
   rate: 0.9582463465553236
- job: dell-hw-tempest-dsvm-ironic-pxe_ipmitool
   rate: 0.9111111111111111
- job: tempest-dsvm-ironic-pxe-irmc
   rate: 0.8171428571428572
- job: gate-tempest-ironic-ilo-driver-pxe_ilo
   rate: 0.791231732776618

I would like to start the discussion on how we (as a team) can help people 
maintaining the CI to keep failure rate closer to one of our virtual CI (< 30% 
of cases, judging by [2]).

I'm thinking of the following potential problems:

1. Our devstack plugin changes too often.

   I've head this complaint at least once. Should we maybe freeze our devstack 
at some point to allow the vendor folks to catch up? Then we should start 
looking at the CI results more carefully when modifying it.

2. Our devstack plugin is inconvenient for hardware, and requires hacks.

   This is something Miles (?) told me when trying to set up an environment for 
his hardware lab. If so, can we get a list of pain problems, preferably in a 
form of reported bugs? Myself and hopefully other folks can certainly dedicate 
some time to make your life easier.

3. The number of jobs to run on is too high.

  I've noticed that 3rdparty CI runs even on patches that clearly don't require 
it, e.g. docs-only changes. I suggest the maintainers to adopt some exclude 
rules similar to [3].

   Also, most of the vendors run 3-4 jobs for different flavors of their drivers 
(and it is going to increase with the driver composition work). I wonder if we 
should recommend switching from ironic the baremetal_basic_ops test to what we 
call "standalone" tests [4]. This will allow to have only one job testing 
several drivers/combinations of interfaces within the same time frame.

Finally, I've proposed this topic for the virtual meetup [5] planned in the end 
of April. Please feel free to stop by and let us know how we can help.

Thanks,
Dmitry.

P.S.
I've seen expired or self-signed HTTPS certificates on logs sites of some 
3rdparty CI. Please try to fix such issues as soon as possible to allow the 
community to understand failures.

[1] https://github.com/dtantsur/ci-report
[2] http://paste.openstack.org/show/606467/
[3] 
https://github.com/openstack-infra/project-config/blob/master/zuul/layout.yaml#L1375-L1385
[4] 
https://github.com/openstack/ironic/blob/master/ironic_tempest_plugin/tests/scenario/ironic_standalone/test_basic_ops.py
[5] https://etherpad.openstack.org/p/ironic-virtual-meetup



More information about the OpenStack-dev mailing list