On Wed, Dec 4, 2019 at 1:02 PM Adam Kimball <baha@linux.vnet.ibm.com> wrote:
Hello all,

Mike Turek and I have been working to get a ppc64le job to build tripleo
containers for quite some time now. A few weeks back, we stumbled into a
strange issue where the containers build job was issuing a return code
1, despite all containers successfully building. This has largely
blocked our ability to push containers up to docker hub or report
success through the delorean API.

We've been pushing a number of patches to expand the logging of both our
job, and the tripleo playbooks themselves, to get a better idea of
what's going on. The most recent was a patch to show which containers
have python futures that have been throwing exceptions, and why. [1]

The end result of this seems to be that a number of jobs are reporting
as incomplete. An example of this can be seen at timestamp 20:16:07 of
an example build log. [2]

However, upon checking the list of successfully built containers [3], or
the RDO registry itself [4], one can see that the containers producing
job not complete errors have actually built, and are being uploaded. The
error log generated by the tripleo playbooks is also empty. [5]

At this point, we're wondering what the path forward is. It seems like
the issue stems from some unintended behavior in the tripleo playbooks
themselves, not anything from our job. We're trying to figure out if
this behavior is something that should be preventing us from reporting
successful builds, and if so, how it can be fixed.

Thanks,
Adam Kimball


[1] - https://review.opendev.org/#/c/695723/
[2] -
https://centos.logs.rdoproject.org/tripleo-upstream-containers-build-master-ppc64le/1803/logs/logs/build.log

In reviewing this, it looks like it's failing on the push to the rdo registry. Would it be possible to keep a node and manually see what a push command is doing? For example,

sudo buildah push --tls-verify=False trunk.registry.rdoproject.org/tripleomaster/centos-binary-mistral-executor:36a84820e51bad57c6bbb92429f3afb3d9da29c2_6e3b098e_ppc64le docker://trunk.registry.rdoproject.org/tripleomaster/centos-binary-mistral-executor:36a84820e51bad57c6bbb92429f3afb3d9da29c2_6e3b098e_ppc64le
 We output the commands that are being run so I'm wondering if there's some form of output that's getting eaten.

Thanks,
-Alex



[3] -
https://centos.logs.rdoproject.org/tripleo-upstream-containers-build-master-ppc64le/1803/logs/logs/containers-successfully-built.log
[4] - https://console.registry.rdoproject.org/registry
[5] -
https://centos.logs.rdoproject.org/tripleo-upstream-containers-build-master-ppc64le/1803/logs/logs/containers-build-errors.log