[openstack-dev] Many timeouts in zuul gates for TripleO

Wesley Hayutin whayutin at redhat.com
Fri Jan 19 23:45:42 UTC 2018


On Fri, Jan 19, 2018 at 12:23 PM, Ben Nemec <openstack at nemebean.com> wrote:

>
>
> On 01/18/2018 09:45 AM, Emilien Macchi wrote:
>
>> On Thu, Jan 18, 2018 at 6:34 AM, Or Idgar <oidgar at redhat.com> wrote:
>>
>>> Hi,
>>> we're encountering many timeouts for zuul gates in TripleO.
>>> For example, see
>>> http://logs.openstack.org/95/508195/28/check-tripleo/tripleo
>>> -ci-centos-7-ovb-ha-oooq/c85fcb7/.
>>>
>>> rechecks won't help and sometimes specific gate is end successfully and
>>> sometimes not.
>>> The problem is that after recheck it's not always the same gate which is
>>> failed.
>>>
>>> Is there someone who have access to the servers load to see what cause
>>> this?
>>> alternatively, is there something we can do in order to reduce the
>>> running
>>> time for each gate?
>>>
>>
>> We're migrating to RDO Cloud for OVB jobs:
>> https://review.openstack.org/#/c/526481/
>> It's a work in progress but will help a lot for OVB timeouts on RH1.
>>
>> I'll let the CI folks comment on that topic.
>>
>>
> I noticed that the timeouts on rh1 have been especially bad as of late so
> I did a little testing and found that it did seem to be running more slowly
> than it should.  After some investigation I found that 6 of our compute
> nodes have warning messages that the cpu was throttled due to high
> temperature.  I've disabled 4 of them that had a lot of warnings. The other
> 2 only had a handful of warnings so I'm hopeful we can leave them active
> without affecting job performance too much.  It won't accomplish much if we
> disable the overheating nodes only to overload the remaining ones.
>
> I'll follow up with our hardware people and see if we can determine why
> these specific nodes are overheating.  They seem to be running 20 degrees C
> hotter than the rest of the nodes.
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


For the latest discussion and to-do's before rh1 ovb jobs are migrated to
rdo-cloud look here [1].
TLDR is that we're looking for a run of seven days where the jobs are
passing at around 80% or better in check.
We've reported a number of issues w/ the environment, and AFAIK everything
is now resolved just recently.

[1]
https://trello.com/c/wGUUEqty/384-steps-needed-to-migrate-ovb-to-rdo-cloud
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180119/1cc20d7b/attachment.html>


More information about the OpenStack-dev mailing list