[tripleo][ppc64le] Unexpected Return Code from TripleO Container Build Playbooks on ppc64le
Hello all, Mike Turek and I have been working to get a ppc64le job to build tripleo containers for quite some time now. A few weeks back, we stumbled into a strange issue where the containers build job was issuing a return code 1, despite all containers successfully building. This has largely blocked our ability to push containers up to docker hub or report success through the delorean API. We've been pushing a number of patches to expand the logging of both our job, and the tripleo playbooks themselves, to get a better idea of what's going on. The most recent was a patch to show which containers have python futures that have been throwing exceptions, and why. [1] The end result of this seems to be that a number of jobs are reporting as incomplete. An example of this can be seen at timestamp 20:16:07 of an example build log. [2] However, upon checking the list of successfully built containers [3], or the RDO registry itself [4], one can see that the containers producing job not complete errors have actually built, and are being uploaded. The error log generated by the tripleo playbooks is also empty. [5] At this point, we're wondering what the path forward is. It seems like the issue stems from some unintended behavior in the tripleo playbooks themselves, not anything from our job. We're trying to figure out if this behavior is something that should be preventing us from reporting successful builds, and if so, how it can be fixed. Thanks, Adam Kimball [1] - https://review.opendev.org/#/c/695723/ [2] - https://centos.logs.rdoproject.org/tripleo-upstream-containers-build-master-... [3] - https://centos.logs.rdoproject.org/tripleo-upstream-containers-build-master-... [4] - https://console.registry.rdoproject.org/registry [5] - https://centos.logs.rdoproject.org/tripleo-upstream-containers-build-master-...
On Wed, Dec 4, 2019 at 1:02 PM Adam Kimball <baha@linux.vnet.ibm.com> wrote:
Hello all,
Mike Turek and I have been working to get a ppc64le job to build tripleo containers for quite some time now. A few weeks back, we stumbled into a strange issue where the containers build job was issuing a return code 1, despite all containers successfully building. This has largely blocked our ability to push containers up to docker hub or report success through the delorean API.
We've been pushing a number of patches to expand the logging of both our job, and the tripleo playbooks themselves, to get a better idea of what's going on. The most recent was a patch to show which containers have python futures that have been throwing exceptions, and why. [1]
The end result of this seems to be that a number of jobs are reporting as incomplete. An example of this can be seen at timestamp 20:16:07 of an example build log. [2]
However, upon checking the list of successfully built containers [3], or the RDO registry itself [4], one can see that the containers producing job not complete errors have actually built, and are being uploaded. The error log generated by the tripleo playbooks is also empty. [5]
At this point, we're wondering what the path forward is. It seems like the issue stems from some unintended behavior in the tripleo playbooks themselves, not anything from our job. We're trying to figure out if this behavior is something that should be preventing us from reporting successful builds, and if so, how it can be fixed.
Thanks, Adam Kimball
[1] - https://review.opendev.org/#/c/695723/ [2] -
https://centos.logs.rdoproject.org/tripleo-upstream-containers-build-master-...
In reviewing this, it looks like it's failing on the push to the rdo registry. Would it be possible to keep a node and manually see what a push command is doing? For example, sudo buildah push --tls-verify=False trunk.registry.rdoproject.org/tripleomaster/centos-binary-mistral-executor:36a84820e51bad57c6bbb92429f3afb3d9da29c2_6e3b098e_ppc64le docker://trunk.registry.rdoproject.org/tripleomaster/centos-binary-mistral-executor:36a84820e51bad57c6bbb92429f3afb3d9da29c2_6e3b098e_ppc64le We output the commands that are being run so I'm wondering if there's some form of output that's getting eaten. Thanks, -Alex
[3] -
https://centos.logs.rdoproject.org/tripleo-upstream-containers-build-master-... [4] - https://console.registry.rdoproject.org/registry [5] -
https://centos.logs.rdoproject.org/tripleo-upstream-containers-build-master-...
participants (2)
-
Adam Kimball
-
Alex Schultz