[openstack-dev] [ironic] 3rdparty CI status and how we can help to make it green
Dmitry Tantsur
dtantsur at redhat.com
Thu Apr 13 15:33:05 UTC 2017
Hi all, especially maintainers of 3rdparty CI for Ironic :)
I've been watching our 3rdparty CI results recently. While things have improved
compared to e.g. a month ago, most of jobs still finish with failures. I've
written a simple script [1] to fetch CI runs information from my local Gertty
database, the results [2] show that some jobs still fail surprisingly often (>
50% of cases):
- job: tempest-dsvm-ironic-agent-irmc
rate: 0.9857142857142858
- job: tempest-dsvm-ironic-iscsi-irmc
rate: 0.9771428571428571
- job: dell-hw-tempest-dsvm-ironic-pxe_drac
rate: 0.9682539682539683
- job: gate-tempest-ironic-ilo-driver-iscsi_ilo
rate: 0.9582463465553236
- job: dell-hw-tempest-dsvm-ironic-pxe_ipmitool
rate: 0.9111111111111111
- job: tempest-dsvm-ironic-pxe-irmc
rate: 0.8171428571428572
- job: gate-tempest-ironic-ilo-driver-pxe_ilo
rate: 0.791231732776618
I would like to start the discussion on how we (as a team) can help people
maintaining the CI to keep failure rate closer to one of our virtual CI (< 30%
of cases, judging by [2]).
I'm thinking of the following potential problems:
1. Our devstack plugin changes too often.
I've head this complaint at least once. Should we maybe freeze our devstack
at some point to allow the vendor folks to catch up? Then we should start
looking at the CI results more carefully when modifying it.
2. Our devstack plugin is inconvenient for hardware, and requires hacks.
This is something Miles (?) told me when trying to set up an environment for
his hardware lab. If so, can we get a list of pain problems, preferably in a
form of reported bugs? Myself and hopefully other folks can certainly dedicate
some time to make your life easier.
3. The number of jobs to run on is too high.
I've noticed that 3rdparty CI runs even on patches that clearly don't require
it, e.g. docs-only changes. I suggest the maintainers to adopt some exclude
rules similar to [3].
Also, most of the vendors run 3-4 jobs for different flavors of their drivers
(and it is going to increase with the driver composition work). I wonder if we
should recommend switching from ironic the baremetal_basic_ops test to what we
call "standalone" tests [4]. This will allow to have only one job testing
several drivers/combinations of interfaces within the same time frame.
Finally, I've proposed this topic for the virtual meetup [5] planned in the end
of April. Please feel free to stop by and let us know how we can help.
Thanks,
Dmitry.
P.S.
I've seen expired or self-signed HTTPS certificates on logs sites of some
3rdparty CI. Please try to fix such issues as soon as possible to allow the
community to understand failures.
[1] https://github.com/dtantsur/ci-report
[2] http://paste.openstack.org/show/606467/
[3]
https://github.com/openstack-infra/project-config/blob/master/zuul/layout.yaml#L1375-L1385
[4]
https://github.com/openstack/ironic/blob/master/ironic_tempest_plugin/tests/scenario/ironic_standalone/test_basic_ops.py
[5] https://etherpad.openstack.org/p/ironic-virtual-meetup
More information about the OpenStack-dev
mailing list