<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Mon, May 14, 2018 at 10:36 AM Jeremy Stanley <<a href="mailto:fungi@yuggoth.org">fungi@yuggoth.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 2018-05-14 07:07:03 -0600 (-0600), Wesley Hayutin wrote:<br>

[...]<br>

> I think you may be conflating the notion that ubuntu or rhel/cent<br>

> can be updated w/o any issues to applications that run atop of the<br>

> distributions with what it means to introduce a minor update into<br>

> the upstream openstack ci workflow.<br>

> <br>

> If jobs could execute w/o a timeout the tripleo jobs would have<br>

> not gone red.  Since we do have constraints in the upstream like a<br>

> timeouts and others we have to prepare containers, images etc to<br>

> work efficiently in the upstream.  For example, if our jobs had<br>

> the time to yum update the roughly 120 containers in play in each<br>

> job the tripleo jobs would have just worked.  I am not advocating<br>

> for not having timeouts or constraints on jobs, however I am<br>

> saying this is an infra issue, not a distribution or distribution<br>

> support issue.<br>

> <br>

> I think this is an important point to consider and I view it as<br>

> mostly unrelated to the support claims by the distribution.  Does<br>

> that make sense?<br>

[...]<br>

<br>

Thanks, the thread jumped straight to suggesting costly fixes<br>

(separate images for each CentOS point release, adding an evaluation<br>

period or acceptance testing for new point releases, et cetera)<br>

without coming anywhere close to exploring the problem space. Is<br>

your only concern that when your jobs started using CentOS 7.5<br>

instead of 7.4 they took longer to run?</blockquote><div><br></div><div>Yes, If they had unlimited time to run, our workflow would have everything updated to CentOS 7.5 in the job itself and I would expect everything to just work.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> What was the root cause? Are<br>

you saying your jobs consume externally-produced artifacts which lag<br>

behind CentOS package updates? </blockquote><div><br></div><div>Yes, TripleO has externally produced overcloud images, and containers both of which can be yum updated but we try to ensure they are frequently recreated so the yum transaction is small.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Couldn't a significant burst of new<br>

packages cause the same symptoms even without it being tied to a<br>

minor version increase?<br></blockquote><div><br></div><div>Yes, certainly this could happen outside of a minor update of the baseos.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

This _doesn't_ sound to me like a problem with how we've designed<br>

our infrastructure, unless there are additional details you're<br>

omitting.</blockquote><div><br></div><div>So the only thing out of our control is the package set on the base nodepool image.</div><div>If that suddenly gets updated with too many packages, then we have to scramble to ensure the images and containers are also udpated.</div><div>If there is a breaking change in the nodepool image for example [a], we have to react to and fix that as well.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> It sounds like a problem with how the jobs are designed<br>

and expectations around distros slowly trickling package updates<br>

into the series without occasional larger bursts of package deltas.<br>

I'd like to understand more about why you upgrade packages inside<br>

your externally-produced container images at job runtime at all,<br>

rather than relying on the package versions baked into them.</blockquote><div><br></div><div>We do that to ensure the gerrit review itself and it's dependencies are built via rpm and injected into the build.</div><div>If we did not do this the job would not be testing the change at all.   This is a result of being a package based deployment for better or worse.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> It<br>

seems like you're arguing that the existence of lots of new package<br>

versions which aren't already in your container images is the<br>

problem, in which case I have trouble with the rationalization of it<br>

being "an infra issue" insofar as it requires changes to the<br>

services as provided by the OpenStack Infra team.<br>

<br>

Just to be clear, we didn't "introduce a minor update into the<br>

upstream openstack ci workflow." We continuously pull CentOS 7<br>

packages into our package mirrors, and continuously rebuild our<br>

centos-7 images from whatever packages the distro says are current.<br></blockquote><div><br></div><div>Understood, which I think is fine and probably works for most projects.</div><div>An enhancement could be to stage the new images for say one week or so.</div><div>Do we need the CentOS updates immediately? Is there a possible path that </div><div>does not create a lot of work for infra, but also provides some space for projects</div><div>to prep for the consumption of the updates?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Our automation doesn't know that there's a difference between<br>

packages which were part of CentOS 7.4 and 7.5 any more than it<br>

knows that there's a difference between Ubuntu 16.04.2 and 16.04.3.<br>

Even if we somehow managed to pause our CentOS image updates<br>

immediately prior to 7.5, jobs would still try to upgrade those<br>

7.4-based images to the 7.5 packages in our mirror, right?<br></blockquote><div><br></div><div>Understood, I suspect this will become a more widespread issue as</div><div>more projects start to use containers ( not sure ).  It's my understanding that</div><div>there are some mechanisms in place to pin packages in the centos nodepool image so</div><div>there has been some thoughts generally in the area of this issue.</div><div><br></div><div>TripleO may be the exception to the rule here and that is fine, I'm more interested in exploring</div><div>the possibilities of delivering updates in a staged fashion than anything.  I don't have insight into </div><div>what the possibilities are, or if other projects have similiar issues or requests.  Perhaps the TripleO</div><div>project could share the details of our job workflow with the community and this would make more sense.</div><div><br></div><div>I appreciate your time, effort and thoughts you have shared in the thread.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

-- <br>

Jeremy Stanley<br></blockquote><div><br></div><div>[a] <a href="https://bugs.launchpad.net/tripleo/+bug/1770298">https://bugs.launchpad.net/tripleo/+bug/1770298</a> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</blockquote></div></div>