On 2021-06-22 00:23:32 +0000 (+0000), Brin Zhang(张百林) wrote:
There is a patch you can check https://review.opendev.org/c/openstack/cyborg/+/790937 , tempest failed https://050bde8a54f119be7071-8157e9570cd7007a824b373cbf52d06c.ssl.cf2.rackcd...
Thanks, that helps. The build history indicates that the job was succeeding for openstack/cyborg up through 2021-06-09 08:22 UTC, but was failing consistently as of 2021-06-10 09:26 UTC, so something probably changed in that 24 hour period to affect the job: https://zuul.opendev.org/t/openstack/builds?job_name=cyborg-tempest&project=openstack/cyborg Broadening that query to other projects, I can see it succeeded as recently as 2021-06-10 01:34 for an openstack/nova change in check. What's more interesting is that it's continuing to succeed consistently for stable branches, even stable/wallaby, just not master. Both the succeeding and failing builds for master ran on regular ubuntu-focal nodes in a number of different cloud providers which don't have any specialized accelerator hardware, so I have to assume what's changed has nothing to do with the underlying test environment.
Recently there is always report "no valid host" when create an accelerator server, as below, that out of our control :(, """ tempest.exceptions.BuildErrorException: Server feef6015-5211-481b-813f-c5924cdf6931 failed to build and is in ERROR status Details: {'code': 500, 'created': '2021-06-21T01:13:52Z', 'message': 'No valid host was found. '} """ [...]
This is when scheduling an accelerator within DevStack, right? Were you maybe using some sort of mock/fake accelerator for testing purposes? Because there wouldn't have been actual accelerators exposed to that environment even back when the job was still succeeding. Regardless, I suspect something merged early UTC on 2021-06-10 to the master branch of one of the services or tools with which Cyborg interacting to cause this error to begin appearing. The fact that the same job is running fine for stable/wallaby also indicates it's probably some behavior which hasn't been backported yet. Hopefully that helps narrow it down. -- Jeremy Stanley