[placement][sdk] How to debug HTTP 502 errors with placement in DevStack?
Stephen Finucane
stephenfin at redhat.com
Thu Aug 3 18:30:11 UTC 2023
On Wed, 2023-08-02 at 16:50 -0700, Clark Boylan wrote:
> On Wed, Aug 2, 2023, at 10:22 AM, Stephen Finucane wrote:
> > We recently merged support for placement traits in openstacksdk. Since then,
> > we've seen an uptick in failures of various functional jobs [1]. The failure is
> > always the same test:
> >
> > openstack.tests.functional.placement.v1.test_trait.TestTrait.test_resource_pr
> > ovider_inventory
> >
> > That test simply creates a new, custom trait and then attempts to list all
> > traits, show an individual trait, and finally delete the trait. The failure
> > occurs during the first step, creation of the custom trait:
> >
> > openstack.exceptions.HttpException: HttpException: 502: Server Error for url:
> > https://10.209.100.9/placement/traits/CUSTOM_A982E0BA1C2B4D08BFD6D2594C678313
> > , Bad Gateway: response from an upstream server.: The proxy server received
> > an invalid: Apache/2.4.52 (Ubuntu) Server at 10.209.100.9 Port 80:
> > Additionally, a 201 Created: 502 Bad Gateway: error was encountered while
> > trying to use an ErrorDocument to handle the request.
> >
> > I've looked through the various job artefacts and haven't found any smoking
> > guns. I can see placement receive and reply to the request so it would seem
> > something is happening in between.
>
> Yes, this appears to be some problem in apache2 (possibly caused by the response but as far as the backend server is concerned everything is ok). I would increase the log level of the apache server. There are two places to do this 1) for the https frontend here [2] and 2) for the http wsgi backend here [3]. I think the first file comes from the apache2 package in Ubuntu so I'm not sure what the best way to modify that is. The https proxy file is configured by devstack/lib/tls in a heredoc which you can modify for that frontend.
>
> You mention this is locally reproducible so you may be able to simply edit those files on disk and restart apache2 without needing to modify devstack. Hopefully, extra logging will give a better indication of what is going on.
I gave this a shot today but didn't get anywhere, unfortunately. I've posted
full logs from two PUT requests here [1]. Diffing them, they're effectively
identical right up until the response is sent. As in the CI, I can't see
anything wrong in the Placement logs themselves either. I tried adding logging
at multiple points and the only thing that seemed to "fix" things was adding
large logs (dumping the entire request object) in 'placement.wsgi_wrapper', but
I don't know if that was a fluke or what. Back to the drawing board it would
seem.
Stephen
[1] https://paste.opendev.org/show/bd6TFQXwj0zmHF0EHpqA/
>
> >
> > *Fortunately*, this is also reproducible locally against a standard devstack
> > deployment by running the following in the openstacksdk repo:
> >
> > OS_TEST_TIMEOUT=60 tox -e functional-py310 -- \
> > -n openstack/tests/functional/placement/v1/test_trait.py \
> > --until-failure
> >
> > Does anyone have any insight into what could be causing this issue and have
> > suggestions for how we might go about debugging it? As things I haven't a clue
> > 😔
> >
> > Cheers,
> > Stephen
> >
> > [1]
> > https://zuul.opendev.org/t/openstack/builds?job_name=openstacksdk-functional-devstack&project=openstack%2Fdevstack&branch=master&skip=0
>
> [2] https://zuul.opendev.org/t/openstack/build/b37d2aedd1514682b3672c4b732b2717/log/controller/logs/apache_config/000-default_conf.txt#14-18
> [3] https://zuul.opendev.org/t/openstack/build/b37d2aedd1514682b3672c4b732b2717/log/controller/logs/apache_config/http-services-tls-proxy_conf.txt#27
>
More information about the openstack-discuss
mailing list