[openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Bogdan Dobrelya bdobreli at redhat.com
Tue May 15 15:54:42 UTC 2018


On 5/15/18 5:08 PM, Sagi Shnaidman wrote:
> Bogdan,
> 
> I think before final decisions we need to know exactly - what a price we 
> need to pay? Without exact numbers it will be difficult to discuss about.
> I we need to wait 80 mins of undercloud-containers job to finish for 
> starting all other jobs, it will be about 4.5 hours to wait for result 
> (+ 4.5 hours in gate) which is too big price imho and doesn't worth an 
> effort.
> 
> What are exact numbers we are talking about?

I fully agree but can't have those numbers, sorry! As I noted above, 
those are definitely sitting in openstack-infra's elastic search DB, 
just needed to get extracted with some assistance of folks who know more 
on that!

> 
> Thanks
> 
> 
> On Tue, May 15, 2018 at 3:07 PM, Bogdan Dobrelya <bdobreli at redhat.com 
> <mailto:bdobreli at redhat.com>> wrote:
> 
>     Let me clarify the problem I want to solve with pipelines.
> 
>     It is getting *hard* to develop things and move patches to the Happy
>     End (merged):
>     - Patches wait too long for CI jobs to start. It should be minutes
>     and not hours of waiting.
>     - If a patch fails a job w/o a good reason, the consequent recheck
>     operation repeat waiting all over again.
> 
>     How pipelines may help solve it?
>     Pipelines only alleviate, not solve the problem of waiting. We only
>     want to build pipelines for the main zuul check process, omitting
>     gating and RDO CI (for now).
> 
>     Where are two cases to consider:
>     - A patch succeeds all checks
>     - A patch fails a check with dependencies
> 
>     The latter cases benefit us the most, when pipelines are designed
>     like it is proposed here. So that any jobs expected to fail, when a
>     dependency fails, will be omitted from execution. This saves HW
>     resources and zuul queue places a lot, making it available for other
>     patches and allowing those to have CI jobs started faster (less
>     waiting!). When we have "recheck storms", like because of some known
>     intermittent side issue, that outcome is multiplied by the recheck
>     storm um... level, and delivers even better and absolutely amazing
>     results :) Zuul queue will not be growing insanely getting
>     overwhelmed by multiple clones of the rechecked jobs highly likely
>     deemed to fail, and blocking other patches what might have chances
>     to pass checks as non-affected by that intermittent issue.
> 
>     And for the first case, when a patch succeeds, it takes some
>     extended time, and that is the price to pay. How much time it takes
>     to finish in a pipeline fully depends on implementation.
> 
>     The effectiveness could only be measured with numbers extracted from
>     elastic search data, like average time to wait for a job to start,
>     success vs fail execution time percentiles for a job, average amount
>     of rechecks, recheck storms history et al. I don't have that data
>     and don't know how to get it. Any help with that is very appreciated
>     and could really help to move the proposed patches forward or
>     decline it. And we could then compare "before" and "after" as well.
> 
>     I hope that explains the problem scope and the methodology to
>     address that.
> 
> 
>     On 5/14/18 6:15 PM, Bogdan Dobrelya wrote:
> 
>         An update for your review please folks
> 
>             Bogdan Dobrelya <bdobreli at redhat.com <http://redhat.com>>
>             writes:
> 
>                 Hello.
>                 As Zuul documentation [0] explains, the names "check",
>                 "gate", and
>                 "post"  may be altered for more advanced pipelines. Is
>                 it doable to
>                 introduce, for particular openstack projects, multiple check
>                 stages/steps as check-1, check-2 and so on? And is it
>                 possible to make
>                 the consequent steps reusing environments from the
>                 previous steps
>                 finished with?
> 
>                 Narrowing down to tripleo CI scope, the problem I'd want
>                 we to solve
>                 with this "virtual RFE", and using such multi-staged
>                 check pipelines,
>                 is reducing (ideally, de-duplicating) some of the common
>                 steps for
>                 existing CI jobs.
> 
> 
>             What you're describing sounds more like a job graph within a
>             pipeline.
>             See:
>             https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies
>             <https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies>
> 
>             for how to configure a job to run only after another job has
>             completed.
>             There is also a facility to pass data between such jobs.
> 
>             ... (skipped) ...
> 
>             Creating a job graph to have one job use the results of the
>             previous job
>             can make sense in a lot of cases.  It doesn't always save *time*
>             however.
> 
>             It's worth noting that in OpenStack's Zuul, we have made an
>             explicit
>             choice not to have long-running integration jobs depend on
>             shorter pep8
>             or tox jobs, and that's because we value developer time more
>             than CPU
>             time.  We would rather run all of the tests and return all
>             of the
>             results so a developer can fix all of the errors as quickly
>             as possible,
>             rather than forcing an iterative workflow where they have to
>             fix all the
>             whitespace issues before the CI system will tell them which
>             actual tests
>             broke.
> 
>             -Jim
> 
> 
>         I proposed a few zuul dependencies [0], [1] to tripleo CI
>         pipelines for undercloud deployments vs upgrades testing (and
>         some more). Given that those undercloud jobs have not so high
>         fail rates though, I think Emilien is right in his comments and
>         those would buy us nothing.
> 
>           From the other side, what do you think folks of making the
>         tripleo-ci-centos-7-3nodes-multinode depend on
>         tripleo-ci-centos-7-containers-multinode [2]? The former seems
>         quite faily and long running, and is non-voting. It deploys (see
>         featuresets configs [3]*) a 3 nodes in HA fashion. And it seems
>         almost never passing, when the containers-multinode fails - see
>         the CI stats page [4]. I've found only a 2 cases there for the
>         otherwise situation, when containers-multinode fails, but
>         3nodes-multinode passes. So cutting off those future failures
>         via the dependency added, *would* buy us something and allow
>         other jobs to wait less to commence, by a reasonable price of
>         somewhat extended time of the main zuul pipeline. I think it
>         makes sense and that extended CI time will not overhead the RDO
>         CI execution times so much to become a problem. WDYT?
> 
>         [0] https://review.openstack.org/#/c/568275/
>         <https://review.openstack.org/#/c/568275/>
>         [1] https://review.openstack.org/#/c/568278/
>         <https://review.openstack.org/#/c/568278/>
>         [2] https://review.openstack.org/#/c/568326/
>         <https://review.openstack.org/#/c/568326/>
>         [3]
>         https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html
>         <https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html>
> 
>         [4] http://tripleo.org/cistatus.html
>         <http://tripleo.org/cistatus.html>
> 
>         * ignore the column 1, it's obsolete, all CI jobs now using
>         configs download AFAICT...
> 
> 
> 
>     -- 
>     Best regards,
>     Bogdan Dobrelya,
>     Irc #bogdando
> 
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> 
> 
> 
> 
> -- 
> Best regards
> Sagi Shnaidman
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando



More information about the OpenStack-dev mailing list