<div dir="ltr"><div><div><div>Bogdan,<br><br>I think before final decisions we need to know exactly - what a price we need to pay? Without exact numbers it will be difficult to discuss about.<br></div>I we need to wait 80 mins of undercloud-containers job to finish for starting all other jobs, it will be about 4.5 hours to wait for result (+ 4.5 hours in gate) which is too big price imho and doesn't worth an effort.<br><br></div>What are exact numbers we are talking about?<br><br></div>Thanks<br><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, May 15, 2018 at 3:07 PM, Bogdan Dobrelya <span dir="ltr"><<a href="mailto:bdobreli@redhat.com" target="_blank">bdobreli@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Let me clarify the problem I want to solve with pipelines.<br>
<br>
It is getting *hard* to develop things and move patches to the Happy End (merged):<br>
- Patches wait too long for CI jobs to start. It should be minutes and not hours of waiting.<br>
- If a patch fails a job w/o a good reason, the consequent recheck operation repeat waiting all over again.<br>
<br>
How pipelines may help solve it?<br>
Pipelines only alleviate, not solve the problem of waiting. We only want to build pipelines for the main zuul check process, omitting gating and RDO CI (for now).<br>
<br>
Where are two cases to consider:<br>
- A patch succeeds all checks<br>
- A patch fails a check with dependencies<br>
<br>
The latter cases benefit us the most, when pipelines are designed like it is proposed here. So that any jobs expected to fail, when a dependency fails, will be omitted from execution. This saves HW resources and zuul queue places a lot, making it available for other patches and allowing those to have CI jobs started faster (less waiting!). When we have "recheck storms", like because of some known intermittent side issue, that outcome is multiplied by the recheck storm um... level, and delivers even better and absolutely amazing results :) Zuul queue will not be growing insanely getting overwhelmed by multiple clones of the rechecked jobs highly likely deemed to fail, and blocking other patches what might have chances to pass checks as non-affected by that intermittent issue.<br>
<br>
And for the first case, when a patch succeeds, it takes some extended time, and that is the price to pay. How much time it takes to finish in a pipeline fully depends on implementation.<br>
<br>
The effectiveness could only be measured with numbers extracted from elastic search data, like average time to wait for a job to start, success vs fail execution time percentiles for a job, average amount of rechecks, recheck storms history et al. I don't have that data and don't know how to get it. Any help with that is very appreciated and could really help to move the proposed patches forward or decline it. And we could then compare "before" and "after" as well.<br>
<br>
I hope that explains the problem scope and the methodology to address that.<div class="HOEnZb"><div class="h5"><br>
<br>
On 5/14/18 6:15 PM, Bogdan Dobrelya wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
An update for your review please folks<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Bogdan Dobrelya <bdobreli at <a href="http://redhat.com" rel="noreferrer" target="_blank">redhat.com</a>> writes:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hello.<br>
As Zuul documentation [0] explains, the names "check", "gate", and<br>
"post" may be altered for more advanced pipelines. Is it doable to<br>
introduce, for particular openstack projects, multiple check<br>
stages/steps as check-1, check-2 and so on? And is it possible to make<br>
the consequent steps reusing environments from the previous steps<br>
finished with?<br>
<br>
Narrowing down to tripleo CI scope, the problem I'd want we to solve<br>
with this "virtual RFE", and using such multi-staged check pipelines,<br>
is reducing (ideally, de-duplicating) some of the common steps for<br>
existing CI jobs.<br>
</blockquote>
<br>
What you're describing sounds more like a job graph within a pipeline.<br>
See: <a href="https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies" rel="noreferrer" target="_blank">https://docs.openstack.org/inf<wbr>ra/zuul/user/config.html#attr-<wbr>job.dependencies</a> <br>
for how to configure a job to run only after another job has completed.<br>
There is also a facility to pass data between such jobs.<br>
<br>
... (skipped) ...<br>
<br>
Creating a job graph to have one job use the results of the previous job<br>
can make sense in a lot of cases. It doesn't always save *time*<br>
however.<br>
<br>
It's worth noting that in OpenStack's Zuul, we have made an explicit<br>
choice not to have long-running integration jobs depend on shorter pep8<br>
or tox jobs, and that's because we value developer time more than CPU<br>
time. We would rather run all of the tests and return all of the<br>
results so a developer can fix all of the errors as quickly as possible,<br>
rather than forcing an iterative workflow where they have to fix all the<br>
whitespace issues before the CI system will tell them which actual tests<br>
broke.<br>
<br>
-Jim<br>
</blockquote>
<br>
I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for undercloud deployments vs upgrades testing (and some more). Given that those undercloud jobs have not so high fail rates though, I think Emilien is right in his comments and those would buy us nothing.<br>
<br>
From the other side, what do you think folks of making the<br>
tripleo-ci-centos-7-3nodes-mul<wbr>tinode depend on tripleo-ci-centos-7-containers<wbr>-multinode [2]? The former seems quite faily and long running, and is non-voting. It deploys (see featuresets configs [3]*) a 3 nodes in HA fashion. And it seems almost never passing, when the containers-multinode fails - see the CI stats page [4]. I've found only a 2 cases there for the otherwise situation, when containers-multinode fails, but 3nodes-multinode passes. So cutting off those future failures via the dependency added, *would* buy us something and allow other jobs to wait less to commence, by a reasonable price of somewhat extended time of the main zuul pipeline. I think it makes sense and that extended CI time will not overhead the RDO CI execution times so much to become a problem. WDYT?<br>
<br>
[0] <a href="https://review.openstack.org/#/c/568275/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/568275/</a><br>
[1] <a href="https://review.openstack.org/#/c/568278/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/568278/</a><br>
[2] <a href="https://review.openstack.org/#/c/568326/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/568326/</a><br>
[3] <a href="https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html" rel="noreferrer" target="_blank">https://docs.openstack.org/tri<wbr>pleo-quickstart/latest/feature<wbr>-configuration.html</a> <br>
[4] <a href="http://tripleo.org/cistatus.html" rel="noreferrer" target="_blank">http://tripleo.org/cistatus.ht<wbr>ml</a><br>
<br>
* ignore the column 1, it's obsolete, all CI jobs now using configs download AFAICT...<br>
<br>
</blockquote>
<br>
<br>
-- <br>
Best regards,<br>
Bogdan Dobrelya,<br>
Irc #bogdando<br>
<br>
______________________________<wbr>______________________________<wbr>______________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div>Best regards<br></div>Sagi Shnaidman<br></div></div>
</div>