<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Sep 28, 2017 at 3:23 AM, Steven Hardy <span dir="ltr"><<a href="mailto:shardy@redhat.com" target="_blank">shardy@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">On Thu, Sep 28, 2017 at 8:04 AM, Marios Andreou <<a href="mailto:mandreou@redhat.com">mandreou@redhat.com</a>> wrote:<br>
><br>
><br>
> On Thu, Sep 28, 2017 at 9:50 AM, mathieu bultel <<a href="mailto:mbultel@redhat.com">mbultel@redhat.com</a>> wrote:<br>
>><br>
>> Hi,<br>
>><br>
>><br>
>> On 09/28/2017 05:05 AM, Emilien Macchi wrote:<br>
>> > I was reviewing <a href="https://review.openstack.org/#/c/487496/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/487496/</a> and<br>
>> > <a href="https://review.openstack.org/#/c/487488/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/487488/</a> when I realized that we still<br>
>> > didn't have any test coverage for minor updates.<br>
>> > We never had this coverage AFICT but this is not a reason to not push<br>
>> > forward it.<br>
>> Thank you for the review and the -2! :)<br>
>> So I'm agree with you, we need CI coverage for that part, and I was<br>
>> wondering how I can put quickly a test in CI for the minor update.<br>
>> But before that, just few things to take in account regarding those<br>
>> reviews:<br>
>><br>
><br>
> agree on the need for the ci coverage, but disagree on blocking this. by the<br>
> same logic we should not have landed anything minor update related during<br>
> the previous cycle. This is the very last part for<br>
> <a href="https://bugs.launchpad.net/tripleo/+bug/1715557" rel="noreferrer" target="_blank">https://bugs.launchpad.net/<wbr>tripleo/+bug/1715557</a> - wiring up the mechanism<br>
> into client and what's more matbu has managed to do it 'properly' with a<br>
> tripleo-common mistral action wired up to the tripleoclient cli.<br>
><br>
> I don't think its right we don't have coverage but I also don't think its<br>
> right to block these last patches,<br>
<br>
</span>Yeah I agree - FWIW we have discussed this before, and AIUI the plan was:<br>
<br>
1 - Get multinode coverage of an HA deployment with more than on<br>
controller (e.g the 3nodes job) but with containers enabled<br>
2- Implement a rolling minor update test based on that<br>
multi-controller HA-with-containers test<br>
<br>
AFAIK we're only starting to get containers+pacemaker CI scenarios<br>
working with one controller, so it's not really reasonable to block<br>
this, since that is a prerequisite to the multi-controller test, which<br>
is a prerequisite to the rolling update test.<br>
<br>
Personally I think we'd be best to aim directly for the rolling update<br>
test in CI, as doing a single node minor update doesn't really test<br>
the most important aspect (e.g zero downtime).<br>
<br>
The other challenge here is the walltime relative to the CI timeout -<br>
we've been running into that for the containers upgrade job, and I<br>
think we need to figure out optimizations there which may also be<br>
required for minor update testing (maybe we can work around that by<br>
only updating a very small number of containers, but that will reduce<br>
the test coverage considerably?)<br></blockquote><div><br></div><div>OK.. I think the solution is to start migrating these jobs to RDO Software Factory third party testing.</div><div><br></div><div>Here is what I propose:</div><div>1. Start with an experiment check job <a href="https://review.rdoproject.org/r/#/c/9823/">https://review.rdoproject.org/r/#/c/9823/</a></div><div>This will help us confirm that everything works or fails as we expect. We are</div><div>also afforded a configurable timeout \0/. It's currently set to 360 minutes for the overcloud upgrade jobs.</div><div><br></div><div>2. Once this is proven out, we can run upgrade jobs as third party on any review upstream</div><div><br></div><div>3. New coverage should be prototyped in RDO Software Factory</div><div><br></div><div>4. If jobs prove to be reliable and consistent and run under 170 minutes we move what</div><div>we can back upstream.</div><div><br></div><div>WDYT?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
I completely agree we need this coverage, and honestly we should have<br>
had it a long time ago, but we need to make progress on this last<br>
critical blocker for pike, while continuing to make progress on the CI<br>
coverage (which should certainly be a top priority for the Lifecycle<br>
squad, as soon as we have this completely new-for-pike minor updates<br>
workflow fully implemented and debugged).<br>
<br>
Thanks,<br>
<br>
Steve<br>
<div class="gmail-HOEnZb"><div class="gmail-h5"><br>
______________________________<wbr>______________________________<wbr>______________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.<wbr>openstack.org?subject:<wbr>unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-dev</a><br>
</div></div></blockquote></div><br></div></div>