[openstack-dev] Many timeouts in zuul gates for TripleO
Or Idgar
oidgar at redhat.com
Mon Jan 22 11:55:25 UTC 2018
Hi,
Still having timeouts but now in tripleo-heat-templates experimental gates
(tripleo-ci-centos-7-ovb-fakeha-caserver and
tripleo-ci-centos-7-ovb-ha-tempest-oooq).
see examples:
http://logs.openstack.org/31/518331/23/experimental-tripleo/tripleo-ci-centos-7-ovb-fakeha-caserver/7502e82/
http://logs.openstack.org/31/518331/23/experimental-tripleo/tripleo-ci-centos-7-ovb-ha-tempest-oooq/46e8e0d/
Anyone have an idea what we can do to fix it?
Thanks,
Idgar
On Sat, Jan 20, 2018 at 4:38 AM, Paul Belanger <pabelanger at redhat.com>
wrote:
> On Fri, Jan 19, 2018 at 11:23:45AM -0600, Ben Nemec wrote:
> >
> >
> > On 01/18/2018 09:45 AM, Emilien Macchi wrote:
> > > On Thu, Jan 18, 2018 at 6:34 AM, Or Idgar <oidgar at redhat.com> wrote:
> > > > Hi,
> > > > we're encountering many timeouts for zuul gates in TripleO.
> > > > For example, see
> > > > http://logs.openstack.org/95/508195/28/check-tripleo/
> tripleo-ci-centos-7-ovb-ha-oooq/c85fcb7/.
> > > >
> > > > rechecks won't help and sometimes specific gate is end successfully
> and
> > > > sometimes not.
> > > > The problem is that after recheck it's not always the same gate
> which is
> > > > failed.
> > > >
> > > > Is there someone who have access to the servers load to see what
> cause this?
> > > > alternatively, is there something we can do in order to reduce the
> running
> > > > time for each gate?
> > >
> > > We're migrating to RDO Cloud for OVB jobs:
> > > https://review.openstack.org/#/c/526481/
> > > It's a work in progress but will help a lot for OVB timeouts on RH1.
> > >
> > > I'll let the CI folks comment on that topic.
> > >
> >
> > I noticed that the timeouts on rh1 have been especially bad as of late
> so I
> > did a little testing and found that it did seem to be running more slowly
> > than it should. After some investigation I found that 6 of our compute
> > nodes have warning messages that the cpu was throttled due to high
> > temperature. I've disabled 4 of them that had a lot of warnings. The
> other
> > 2 only had a handful of warnings so I'm hopeful we can leave them active
> > without affecting job performance too much. It won't accomplish much if
> we
> > disable the overheating nodes only to overload the remaining ones.
> >
> > I'll follow up with our hardware people and see if we can determine why
> > these specific nodes are overheating. They seem to be running 20
> degrees C
> > hotter than the rest of the nodes.
> >
> Did tripleo-test-cloud-rh1 get new kernels applied for meltdown / spectre,
> possible that is impacting performance too?
>
> -Paul
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
--
Best regards,
Or Idgar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180122/fd8343f6/attachment.html>
More information about the OpenStack-dev
mailing list