On Wed, Aug 2, 2023, at 10:22 AM, Stephen Finucane wrote:
We recently merged support for placement traits in openstacksdk. Since then, we've seen an uptick in failures of various functional jobs [1]. The failure is always the same test:
openstack.tests.functional.placement.v1.test_trait.TestTrait.test_resource_pr ovider_inventory
That test simply creates a new, custom trait and then attempts to list all traits, show an individual trait, and finally delete the trait. The failure occurs during the first step, creation of the custom trait:
openstack.exceptions.HttpException: HttpException: 502: Server Error for url: https://10.209.100.9/placement/traits/CUSTOM_A982E0BA1C2B4D08BFD6D2594C67831... , Bad Gateway: response from an upstream server.: The proxy server received an invalid: Apache/2.4.52 (Ubuntu) Server at 10.209.100.9 Port 80: Additionally, a 201 Created: 502 Bad Gateway: error was encountered while trying to use an ErrorDocument to handle the request.
I've looked through the various job artefacts and haven't found any smoking guns. I can see placement receive and reply to the request so it would seem something is happening in between.
Yes, this appears to be some problem in apache2 (possibly caused by the response but as far as the backend server is concerned everything is ok). I would increase the log level of the apache server. There are two places to do this 1) for the https frontend here [2] and 2) for the http wsgi backend here [3]. I think the first file comes from the apache2 package in Ubuntu so I'm not sure what the best way to modify that is. The https proxy file is configured by devstack/lib/tls in a heredoc which you can modify for that frontend. You mention this is locally reproducible so you may be able to simply edit those files on disk and restart apache2 without needing to modify devstack. Hopefully, extra logging will give a better indication of what is going on.
*Fortunately*, this is also reproducible locally against a standard devstack deployment by running the following in the openstacksdk repo:
OS_TEST_TIMEOUT=60 tox -e functional-py310 -- \ -n openstack/tests/functional/placement/v1/test_trait.py \ --until-failure
Does anyone have any insight into what could be causing this issue and have suggestions for how we might go about debugging it? As things I haven't a clue 😔
Cheers, Stephen
[2] https://zuul.opendev.org/t/openstack/build/b37d2aedd1514682b3672c4b732b2717/... [3] https://zuul.opendev.org/t/openstack/build/b37d2aedd1514682b3672c4b732b2717/...