[tripleo][ci][infra] jobs in retry_limit or skipped

Jeremy Stanley fungi at yuggoth.org
Thu Apr 1 15:04:53 UTC 2021


On 2021-04-01 13:53:46 +0530 (+0530), Chandan Kumar wrote:
> On Thu, Apr 1, 2021 at 7:02 AM Wesley Hayutin <whayutin at redhat.com> wrote:
> >
> > Greetings,
> >
> > Just FYI.. I believe we hit a bump in the road in upstream infra ( not sure yet ). It appears to be global and not isolated to tripleo or centos based jobs.
> >
> > I have a tripleo bug to track it.
> > https://bugs.launchpad.net/tripleo/+bug/1922148
> >
> > See #opendev for details, it looks like infra is very busy working and fixing the issues atm.
> >
> > http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2021-03-31.log.html#t2021-03-31T10:34:51
> > http://eavesdrop.openstack.org/irclogs/%23opendev/latest.log.html
> >
> 
> Zuul got restarted, jobs have started working fine now.
> if there is no job running against the patches, please recheck your
> patches slowly as it might flood the gates.

It's a complex situation with a few problems intermingled. First,
the tripleo-ansible-centos-8-molecule-tripleo-modules job seemed to
have some bug of its own causing frequent disconnects of the job
node leading to retries. Also some recent change in Zuul seems to
have introduced a semi-slow memory leak which, when we run into
memory pressure on the scheduler, causes Zookeeper disconnects which
trigger mass build retries. Further, because the source of the
memory leak has been really tough to nail down, live debugging
directly in the running process has been applied, and this slows the
scheduler by orders of magnitude when engaged, triggering similar
Zookeeper disconnects as well.
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210401/6892f0f1/attachment.sig>


More information about the openstack-discuss mailing list