[openstack-dev] Many timeouts in zuul gates for TripleO
oidgar at redhat.com
Mon Jan 22 11:55:25 UTC 2018
Still having timeouts but now in tripleo-heat-templates experimental gates
Anyone have an idea what we can do to fix it?
On Sat, Jan 20, 2018 at 4:38 AM, Paul Belanger <pabelanger at redhat.com>
> On Fri, Jan 19, 2018 at 11:23:45AM -0600, Ben Nemec wrote:
> > On 01/18/2018 09:45 AM, Emilien Macchi wrote:
> > > On Thu, Jan 18, 2018 at 6:34 AM, Or Idgar <oidgar at redhat.com> wrote:
> > > > Hi,
> > > > we're encountering many timeouts for zuul gates in TripleO.
> > > > For example, see
> > > > http://logs.openstack.org/95/508195/28/check-tripleo/
> > > >
> > > > rechecks won't help and sometimes specific gate is end successfully
> > > > sometimes not.
> > > > The problem is that after recheck it's not always the same gate
> which is
> > > > failed.
> > > >
> > > > Is there someone who have access to the servers load to see what
> cause this?
> > > > alternatively, is there something we can do in order to reduce the
> > > > time for each gate?
> > >
> > > We're migrating to RDO Cloud for OVB jobs:
> > > https://review.openstack.org/#/c/526481/
> > > It's a work in progress but will help a lot for OVB timeouts on RH1.
> > >
> > > I'll let the CI folks comment on that topic.
> > >
> > I noticed that the timeouts on rh1 have been especially bad as of late
> so I
> > did a little testing and found that it did seem to be running more slowly
> > than it should. After some investigation I found that 6 of our compute
> > nodes have warning messages that the cpu was throttled due to high
> > temperature. I've disabled 4 of them that had a lot of warnings. The
> > 2 only had a handful of warnings so I'm hopeful we can leave them active
> > without affecting job performance too much. It won't accomplish much if
> > disable the overheating nodes only to overload the remaining ones.
> > I'll follow up with our hardware people and see if we can determine why
> > these specific nodes are overheating. They seem to be running 20
> degrees C
> > hotter than the rest of the nodes.
> Did tripleo-test-cloud-rh1 get new kernels applied for meltdown / spectre,
> possible that is impacting performance too?
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev