<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Sep 28, 2017 at 3:23 AM, Steven Hardy <span dir="ltr"><<a href="mailto:shardy@redhat.com" target="_blank">shardy@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">On Thu, Sep 28, 2017 at 8:04 AM, Marios Andreou <<a href="mailto:mandreou@redhat.com">mandreou@redhat.com</a>> wrote:<br>

><br>

><br>

> On Thu, Sep 28, 2017 at 9:50 AM, mathieu bultel <<a href="mailto:mbultel@redhat.com">mbultel@redhat.com</a>> wrote:<br>

>><br>

>> Hi,<br>

>><br>

>><br>

>> On 09/28/2017 05:05 AM, Emilien Macchi wrote:<br>

>> > I was reviewing <a href="https://review.openstack.org/#/c/487496/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/487496/</a> and<br>

>> > <a href="https://review.openstack.org/#/c/487488/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/487488/</a> when I realized that we still<br>

>> > didn't have any test coverage for minor updates.<br>

>> > We never had this coverage AFICT but this is not a reason to not push<br>

>> > forward it.<br>

>> Thank you for the review and the -2! :)<br>

>> So I'm agree with you, we need CI coverage for that part, and I was<br>

>> wondering how I can put quickly a test in CI for the minor update.<br>

>> But before that, just few things to take in account regarding those<br>

>> reviews:<br>

>><br>

><br>

> agree on the need for the ci coverage, but disagree on blocking this. by the<br>

> same logic we should not have landed anything minor update related during<br>

> the previous cycle. This is the very last part for<br>

> <a href="https://bugs.launchpad.net/tripleo/+bug/1715557" rel="noreferrer" target="_blank">https://bugs.launchpad.net/<wbr>tripleo/+bug/1715557</a> - wiring up the mechanism<br>

> into client and what's more matbu has managed to do it 'properly' with a<br>

> tripleo-common mistral action wired up to the tripleoclient cli.<br>

><br>

> I don't think its right we don't have coverage but I also don't think its<br>

> right to block these last patches,<br>

<br>

</span>Yeah I agree - FWIW we have discussed this before, and AIUI the plan was:<br>

<br>

1 - Get multinode coverage of an HA deployment with more than on<br>

controller (e.g the 3nodes job) but with containers enabled<br>

2- Implement a rolling minor update test based on that<br>

multi-controller HA-with-containers test<br>

<br>

AFAIK we're only starting to get containers+pacemaker CI scenarios<br>

working with one controller, so it's not really reasonable to block<br>

this, since that is a prerequisite to the multi-controller test, which<br>

is a prerequisite to the rolling update test.<br>

<br>

Personally I think we'd be best to aim directly for the rolling update<br>

test in CI, as doing a single node minor update doesn't really test<br>

the most important aspect (e.g zero downtime).<br>

<br>

The other challenge here is the walltime relative to the CI timeout -<br>

we've been running into that for the containers upgrade job, and I<br>

think we need to figure out optimizations there which may also be<br>

required for minor update testing (maybe we can work around that by<br>

only updating a very small number of containers, but that will reduce<br>

the test coverage considerably?)<br></blockquote><div><br></div><div>OK.. I think the solution is to start migrating these jobs to RDO Software Factory third party testing.</div><div><br></div><div>Here is what I propose:</div><div>1. Start with an experiment check job <a href="https://review.rdoproject.org/r/#/c/9823/">https://review.rdoproject.org/r/#/c/9823/</a></div><div>This will help us confirm that everything works or fails as we expect.  We are</div><div>also afforded a configurable timeout \0/. It's currently set to 360 minutes for the overcloud upgrade jobs.</div><div><br></div><div>2. Once this is proven out, we can run upgrade jobs as third party on any review upstream</div><div><br></div><div>3. New coverage should be prototyped in RDO Software Factory</div><div><br></div><div>4. If jobs prove to be reliable and consistent and run under 170 minutes we move what</div><div>we can back upstream.</div><div><br></div><div>WDYT?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

I completely agree we need this coverage, and honestly we should have<br>

had it a long time ago, but we need to make progress on this last<br>

critical blocker for pike, while continuing to make progress on the CI<br>

coverage (which should certainly be a top priority for the Lifecycle<br>

squad, as soon as we have this completely new-for-pike minor updates<br>

workflow fully implemented and debugged).<br>

<br>

Thanks,<br>

<br>

Steve<br>

<div class="gmail-HOEnZb"><div class="gmail-h5"><br>

______________________________<wbr>______________________________<wbr>______________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.<wbr>openstack.org?subject:<wbr>unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-dev</a><br>

</div></div></blockquote></div><br></div></div>