[openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Bogdan Dobrelya bdobreli at redhat.com
Tue May 15 08:43:10 UTC 2018

On 5/14/18 9:15 PM, Sagi Shnaidman wrote:
> Hi, Bogdan
> I like the idea with undercloud job. Actually if undercloud fails, I'd 
> stop all other jobs, because it doens't make sense to run them. Seeing 
> the same failure in 10 jobs doesn't add too much. So maybe adding 
> undercloud job as dependency for all multinode jobs would be great idea.

I like that idea, I'll add another patch in the topic then.

> I think it's worth to check also how long it will delay jobs. Will all 
> jobs wait until undercloud job is running? Or they will be aborted when 
> undercloud job is failing?

That is is a good question for openstack-infra folks developing zuul :)
But, we could just try it and see how it works, happily zuul v3 allows 
doing that just in the scope of proposed patches! My expectation is all 
jobs delayed (and I mean the main zuul pipeline execution time here) by 
an average time of the undercloud deploy job of ~80 min, which hopefully 
should not be a big deal given that there is a separate RDO CI pipeline 
running in parallel, which normally *highly likely* extends that 
extended time anyway :) And given high chances of additional 'recheck 
rdo' runs we can observe these days for patches on review. I wish we 
could introduce inter-pipeline dependencies (zuul CI <-> RDO CI) for 
those as well...

> However I'm very sceptical about multinode containers and scenarios 
> jobs, they could fail because of very different reasons, like race 
> conditions in product or infra issues. Having skipping some of them will 
> lead to more rechecks from devs trying to discover all problems in a 
> row, which will delay the development process significantly.

right, I roughly estimated delay for the main zuul pipeline execution 
time for jobs might be a ~2.5h, which is not good. We could live with 
that had it be a ~1h only, like it takes for the undercloud containers 
job dependency example.

> Thanks
> On Mon, May 14, 2018 at 7:15 PM, Bogdan Dobrelya <bdobreli at redhat.com 
> <mailto:bdobreli at redhat.com>> wrote:
>     An update for your review please folks
>         Bogdan Dobrelya <bdobreli at redhat.com <http://redhat.com>> writes:
>             Hello.
>             As Zuul documentation [0] explains, the names "check",
>             "gate", and
>             "post"  may be altered for more advanced pipelines. Is it
>             doable to
>             introduce, for particular openstack projects, multiple check
>             stages/steps as check-1, check-2 and so on? And is it
>             possible to make
>             the consequent steps reusing environments from the previous
>             steps
>             finished with?
>             Narrowing down to tripleo CI scope, the problem I'd want we
>             to solve
>             with this "virtual RFE", and using such multi-staged check
>             pipelines,
>             is reducing (ideally, de-duplicating) some of the common
>             steps for
>             existing CI jobs.
>         What you're describing sounds more like a job graph within a
>         pipeline.
>         See:
>         https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies
>         <https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies>
>         for how to configure a job to run only after another job has
>         completed.
>         There is also a facility to pass data between such jobs.
>         ... (skipped) ...
>         Creating a job graph to have one job use the results of the
>         previous job
>         can make sense in a lot of cases.  It doesn't always save *time*
>         however.
>         It's worth noting that in OpenStack's Zuul, we have made an explicit
>         choice not to have long-running integration jobs depend on
>         shorter pep8
>         or tox jobs, and that's because we value developer time more
>         than CPU
>         time.  We would rather run all of the tests and return all of the
>         results so a developer can fix all of the errors as quickly as
>         possible,
>         rather than forcing an iterative workflow where they have to fix
>         all the
>         whitespace issues before the CI system will tell them which
>         actual tests
>         broke.
>         -Jim
>     I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines
>     for undercloud deployments vs upgrades testing (and some more).
>     Given that those undercloud jobs have not so high fail rates though,
>     I think Emilien is right in his comments and those would buy us nothing.
>      From the other side, what do you think folks of making the
>     tripleo-ci-centos-7-3nodes-multinode depend on
>     tripleo-ci-centos-7-containers-multinode [2]? The former seems quite
>     faily and long running, and is non-voting. It deploys (see
>     featuresets configs [3]*) a 3 nodes in HA fashion. And it seems
>     almost never passing, when the containers-multinode fails - see the
>     CI stats page [4]. I've found only a 2 cases there for the otherwise
>     situation, when containers-multinode fails, but 3nodes-multinode
>     passes. So cutting off those future failures via the dependency
>     added, *would* buy us something and allow other jobs to wait less to
>     commence, by a reasonable price of somewhat extended time of the
>     main zuul pipeline. I think it makes sense and that extended CI time
>     will not overhead the RDO CI execution times so much to become a
>     problem. WDYT?
>     [0] https://review.openstack.org/#/c/568275/
>     <https://review.openstack.org/#/c/568275/>
>     [1] https://review.openstack.org/#/c/568278/
>     <https://review.openstack.org/#/c/568278/>
>     [2] https://review.openstack.org/#/c/568326/
>     <https://review.openstack.org/#/c/568326/>
>     [3]
>     https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html
>     <https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html>
>     [4] http://tripleo.org/cistatus.html <http://tripleo.org/cistatus.html>
>     * ignore the column 1, it's obsolete, all CI jobs now using configs
>     download AFAICT...
>     -- 
>     Best regards,
>     Bogdan Dobrelya,
>     Irc #bogdando
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> -- 
> Best regards
> Sagi Shnaidman
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Best regards,
Bogdan Dobrelya,
Irc #bogdando

More information about the OpenStack-dev mailing list