<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jan 19, 2018 at 12:23 PM, Ben Nemec <span dir="ltr"><<a href="mailto:openstack@nemebean.com" target="_blank">openstack@nemebean.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-"><br>

<br>

On 01/18/2018 09:45 AM, Emilien Macchi wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

On Thu, Jan 18, 2018 at 6:34 AM, Or Idgar <<a href="mailto:oidgar@redhat.com" target="_blank">oidgar@redhat.com</a>> wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Hi,<br>

we're encountering many timeouts for zuul gates in TripleO.<br>

For example, see<br>

<a href="http://logs.openstack.org/95/508195/28/check-tripleo/tripleo-ci-centos-7-ovb-ha-oooq/c85fcb7/" rel="noreferrer" target="_blank">http://logs.openstack.org/95/5<wbr>08195/28/check-tripleo/tripleo<wbr>-ci-centos-7-ovb-ha-oooq/<wbr>c85fcb7/</a>.<br>

<br>

rechecks won't help and sometimes specific gate is end successfully and<br>

sometimes not.<br>

The problem is that after recheck it's not always the same gate which is<br>

failed.<br>

<br>

Is there someone who have access to the servers load to see what cause this?<br>

alternatively, is there something we can do in order to reduce the running<br>

time for each gate?<br>

</blockquote>

<br>

We're migrating to RDO Cloud for OVB jobs:<br>

<a href="https://review.openstack.org/#/c/526481/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/526481/</a><br>

It's a work in progress but will help a lot for OVB timeouts on RH1.<br>

<br>

I'll let the CI folks comment on that topic.<br>

<br>

</blockquote>

<br></span>

I noticed that the timeouts on rh1 have been especially bad as of late so I did a little testing and found that it did seem to be running more slowly than it should.  After some investigation I found that 6 of our compute nodes have warning messages that the cpu was throttled due to high temperature.  I've disabled 4 of them that had a lot of warnings. The other 2 only had a handful of warnings so I'm hopeful we can leave them active without affecting job performance too much.  It won't accomplish much if we disable the overheating nodes only to overload the remaining ones.<br>

<br>

I'll follow up with our hardware people and see if we can determine why these specific nodes are overheating.  They seem to be running 20 degrees C hotter than the rest of the nodes.<div class="gmail-HOEnZb"><div class="gmail-h5"><br>

<br>

______________________________<wbr>______________________________<wbr>______________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a></div></div></blockquote><div><br></div><div><br></div><div>For the latest discussion and to-do's before rh1 ovb jobs are migrated to rdo-cloud look here [1].</div><div>TLDR is that we're looking for a run of seven days where the jobs are passing at around 80% or better in check.</div><div>We've reported a number of issues w/ the environment, and AFAIK everything is now resolved just recently.</div><div><br></div><div>[1] <a href="https://trello.com/c/wGUUEqty/384-steps-needed-to-migrate-ovb-to-rdo-cloud">https://trello.com/c/wGUUEqty/384-steps-needed-to-migrate-ovb-to-rdo-cloud</a></div></div><br></div></div>