[openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal
bdobreli at redhat.com
Tue May 15 15:54:42 UTC 2018
On 5/15/18 5:08 PM, Sagi Shnaidman wrote:
> I think before final decisions we need to know exactly - what a price we
> need to pay? Without exact numbers it will be difficult to discuss about.
> I we need to wait 80 mins of undercloud-containers job to finish for
> starting all other jobs, it will be about 4.5 hours to wait for result
> (+ 4.5 hours in gate) which is too big price imho and doesn't worth an
> What are exact numbers we are talking about?
I fully agree but can't have those numbers, sorry! As I noted above,
those are definitely sitting in openstack-infra's elastic search DB,
just needed to get extracted with some assistance of folks who know more
> On Tue, May 15, 2018 at 3:07 PM, Bogdan Dobrelya <bdobreli at redhat.com
> <mailto:bdobreli at redhat.com>> wrote:
> Let me clarify the problem I want to solve with pipelines.
> It is getting *hard* to develop things and move patches to the Happy
> End (merged):
> - Patches wait too long for CI jobs to start. It should be minutes
> and not hours of waiting.
> - If a patch fails a job w/o a good reason, the consequent recheck
> operation repeat waiting all over again.
> How pipelines may help solve it?
> Pipelines only alleviate, not solve the problem of waiting. We only
> want to build pipelines for the main zuul check process, omitting
> gating and RDO CI (for now).
> Where are two cases to consider:
> - A patch succeeds all checks
> - A patch fails a check with dependencies
> The latter cases benefit us the most, when pipelines are designed
> like it is proposed here. So that any jobs expected to fail, when a
> dependency fails, will be omitted from execution. This saves HW
> resources and zuul queue places a lot, making it available for other
> patches and allowing those to have CI jobs started faster (less
> waiting!). When we have "recheck storms", like because of some known
> intermittent side issue, that outcome is multiplied by the recheck
> storm um... level, and delivers even better and absolutely amazing
> results :) Zuul queue will not be growing insanely getting
> overwhelmed by multiple clones of the rechecked jobs highly likely
> deemed to fail, and blocking other patches what might have chances
> to pass checks as non-affected by that intermittent issue.
> And for the first case, when a patch succeeds, it takes some
> extended time, and that is the price to pay. How much time it takes
> to finish in a pipeline fully depends on implementation.
> The effectiveness could only be measured with numbers extracted from
> elastic search data, like average time to wait for a job to start,
> success vs fail execution time percentiles for a job, average amount
> of rechecks, recheck storms history et al. I don't have that data
> and don't know how to get it. Any help with that is very appreciated
> and could really help to move the proposed patches forward or
> decline it. And we could then compare "before" and "after" as well.
> I hope that explains the problem scope and the methodology to
> address that.
> On 5/14/18 6:15 PM, Bogdan Dobrelya wrote:
> An update for your review please folks
> Bogdan Dobrelya <bdobreli at redhat.com <http://redhat.com>>
> As Zuul documentation  explains, the names "check",
> "gate", and
> "post" may be altered for more advanced pipelines. Is
> it doable to
> introduce, for particular openstack projects, multiple check
> stages/steps as check-1, check-2 and so on? And is it
> possible to make
> the consequent steps reusing environments from the
> previous steps
> finished with?
> Narrowing down to tripleo CI scope, the problem I'd want
> we to solve
> with this "virtual RFE", and using such multi-staged
> check pipelines,
> is reducing (ideally, de-duplicating) some of the common
> steps for
> existing CI jobs.
> What you're describing sounds more like a job graph within a
> for how to configure a job to run only after another job has
> There is also a facility to pass data between such jobs.
> ... (skipped) ...
> Creating a job graph to have one job use the results of the
> previous job
> can make sense in a lot of cases. It doesn't always save *time*
> It's worth noting that in OpenStack's Zuul, we have made an
> choice not to have long-running integration jobs depend on
> shorter pep8
> or tox jobs, and that's because we value developer time more
> than CPU
> time. We would rather run all of the tests and return all
> of the
> results so a developer can fix all of the errors as quickly
> as possible,
> rather than forcing an iterative workflow where they have to
> fix all the
> whitespace issues before the CI system will tell them which
> actual tests
> I proposed a few zuul dependencies ,  to tripleo CI
> pipelines for undercloud deployments vs upgrades testing (and
> some more). Given that those undercloud jobs have not so high
> fail rates though, I think Emilien is right in his comments and
> those would buy us nothing.
> From the other side, what do you think folks of making the
> tripleo-ci-centos-7-3nodes-multinode depend on
> tripleo-ci-centos-7-containers-multinode ? The former seems
> quite faily and long running, and is non-voting. It deploys (see
> featuresets configs *) a 3 nodes in HA fashion. And it seems
> almost never passing, when the containers-multinode fails - see
> the CI stats page . I've found only a 2 cases there for the
> otherwise situation, when containers-multinode fails, but
> 3nodes-multinode passes. So cutting off those future failures
> via the dependency added, *would* buy us something and allow
> other jobs to wait less to commence, by a reasonable price of
> somewhat extended time of the main zuul pipeline. I think it
> makes sense and that extended CI time will not overhead the RDO
> CI execution times so much to become a problem. WDYT?
>  https://review.openstack.org/#/c/568275/
>  https://review.openstack.org/#/c/568278/
>  https://review.openstack.org/#/c/568326/
>  http://tripleo.org/cistatus.html
> * ignore the column 1, it's obsolete, all CI jobs now using
> configs download AFAICT...
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
> OpenStack Development Mailing List (not for usage questions)
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> Best regards
> Sagi Shnaidman
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev