[all] Gate status: All devstack based jobs are failing for placement url issue in devstack
Hello Everyone, All the devstack based jobs are failing with placement URL issue in devstack. The root cause is not known yet. You will see the below error in devstack/summary log- + ./stack.sh:main:1455 : /usr/local/bin/nova-status --config-file /etc/nova/nova.conf upgrade check Placement logs: INFO placement.requestlog [None req-3ad7ab8b-7c85-477a-b156-f5ca652a38cc service placement] 10.208.226.48 "GET /placemen//" status: 404 len: 162 microversion: 1.0 Please do not recheck until the issue is fixed. -gmann
On 9/27/21 15:20, Ghanshyam Mann wrote:
Hello Everyone,
All the devstack based jobs are failing with placement URL issue in devstack. The root cause is not known yet.
You will see the below error in devstack/summary log-
+ ./stack.sh:main:1455 : /usr/local/bin/nova-status --config-file /etc/nova/nova.conf upgrade check
Placement logs: INFO placement.requestlog [None req-3ad7ab8b-7c85-477a-b156-f5ca652a38cc service placement] 10.208.226.48 "GET /placemen//" status: 404 len: 162 microversion: 1.0
I looked at this for awhile and found that placement is returning a 404 because it fails to match [1] any of the declared routes [2]. This issue reproduces on a local devstack. I added some debug print statements to determine this [3]. (The "ENVIRON" and "NO MATCH FOUND" are the print statements I added in PlacementHandler.__call__() and dispatch() in placement/handler.py). Example: def dispatch(environ, start_response, mapper): """Find a matching route for the current request. If no match is found, raise a 404 response. If there is a matching route, but no matching handler for the given method, raise a 405. """ result = mapper.match(environ=environ) if result is None: print('NO MATCH FOUND') raise webob.exc.HTTPNotFound( json_formatter=util.json_error_formatter) In short, the WSGI environment dict is giving a bad/not matching url in the PATH_INFO variable if the call path is not simply '/'. Example of a good PATH_INFO: ENVIRON = {'proxy-sendchunked': '1', 'PATH_INFO': '/', [...] 'REQUEST_URI': '/placement', 'SCRIPT_NAME': '/placement', [...]} INFO placement.requestlog [None req-bde0b07f-e6a4-4b3d-bb78-51bee95524e1 None None] 127.0.0.1 "GET /placement/" status: 200 len: 136 mi croversion: 1.0 Example of a bad PATH_INFO and SCRIPT_NAME: ENVIRON = {'proxy-sendchunked': '1', 'PATH_INFO': '//', [...] 'REQUEST_URI': '/placement/', 'SCRIPT_NAME': '/placemen', [...]} NO MATCH FOUND INFO placement.requestlog [None req-317708d5-8c6f-48f5-abce-225a242bb31a admin admin] 127.0.0.1 "GET /placemen//" status: 404 len: 198 microversion: 1.29 Another example of bad PATH_INFO and SCRIPT_NAME: ENVIRON = {'proxy-sendchunked': '1', 'PATH_INFO': '//resource_providers', [...] 'REQUEST_URI': '/placement/resource_providers', 'SCRIPT_NAME': '/placemen', [...]} Note that the REQUEST_URI in all cases looks correct. This supports what I observed that the clients are calling the correct urls. The same 404s happen using curl -X GET statements as well. So it's definitely something wrong in the server. I'm not sure how to debug further. I don't know a great deal about WSGI and where the contents of the environment dict come from. If anyone can help, it would be appreciated. Cheers, -melanie [1] https://github.com/openstack/placement/blob/bd5b19c00e1ab293fc157f4699bc4f47... [2] https://github.com/openstack/placement/blob/bd5b19c00e1ab293fc157f4699bc4f47... [3] https://paste.opendev.org/show/809636
---- On Mon, 27 Sep 2021 20:24:18 -0500 melanie witt <melwittt@gmail.com> wrote ----
On 9/27/21 15:20, Ghanshyam Mann wrote:
Hello Everyone,
All the devstack based jobs are failing with placement URL issue in devstack. The root cause is not known yet.
You will see the below error in devstack/summary log-
+ ./stack.sh:main:1455 : /usr/local/bin/nova-status --config-file /etc/nova/nova.conf upgrade check
Placement logs: INFO placement.requestlog [None req-3ad7ab8b-7c85-477a-b156-f5ca652a38cc service placement] 10.208.226.48 "GET /placemen//" status: 404 len: 162 microversion: 1.0
I looked at this for awhile and found that placement is returning a 404 because it fails to match [1] any of the declared routes [2]. This issue reproduces on a local devstack.
I added some debug print statements to determine this [3]. (The "ENVIRON" and "NO MATCH FOUND" are the print statements I added in PlacementHandler.__call__() and dispatch() in placement/handler.py).
Example:
def dispatch(environ, start_response, mapper): """Find a matching route for the current request.
If no match is found, raise a 404 response. If there is a matching route, but no matching handler for the given method, raise a 405. """ result = mapper.match(environ=environ) if result is None: print('NO MATCH FOUND') raise webob.exc.HTTPNotFound( json_formatter=util.json_error_formatter)
In short, the WSGI environment dict is giving a bad/not matching url in the PATH_INFO variable if the call path is not simply '/'.
Example of a good PATH_INFO:
ENVIRON = {'proxy-sendchunked': '1', 'PATH_INFO': '/', [...] 'REQUEST_URI': '/placement', 'SCRIPT_NAME': '/placement', [...]} INFO placement.requestlog [None req-bde0b07f-e6a4-4b3d-bb78-51bee95524e1 None None] 127.0.0.1 "GET /placement/" status: 200 len: 136 mi croversion: 1.0
Example of a bad PATH_INFO and SCRIPT_NAME:
ENVIRON = {'proxy-sendchunked': '1', 'PATH_INFO': '//', [...] 'REQUEST_URI': '/placement/', 'SCRIPT_NAME': '/placemen', [...]} NO MATCH FOUND INFO placement.requestlog [None req-317708d5-8c6f-48f5-abce-225a242bb31a admin admin] 127.0.0.1 "GET /placemen//" status: 404 len: 198 microversion: 1.29
Another example of bad PATH_INFO and SCRIPT_NAME:
ENVIRON = {'proxy-sendchunked': '1', 'PATH_INFO': '//resource_providers', [...] 'REQUEST_URI': '/placement/resource_providers', 'SCRIPT_NAME': '/placemen', [...]}
Note that the REQUEST_URI in all cases looks correct. This supports what I observed that the clients are calling the correct urls. The same 404s happen using curl -X GET statements as well. So it's definitely something wrong in the server.
Yeah. To find out who is corrupting the PATH_INFO and SCRIPT_NAME, I tried to trace the environ (could not finish it) in webob layer and while building the Request in webob, environ variables are already corrupted. 'PATH_INFO': '//resource_providers' 'REQUEST_URI': '/placement/resource_providers?in_tree=baba07a8-3c17-474c-82e6-5d63308a1f9d', 'SCRIPT_NAME': '/placemen', Below log is from webob.des.py->__call__method - https://paste.opendev.org/show/809639/ There is no change in webob recently and as it is only failing in placement (nova, keystone request works fine) so webob or wsgi is less doubtful and it might be only in placement server-side in our Request/Response objects ? -gmann
I'm not sure how to debug further. I don't know a great deal about WSGI and where the contents of the environment dict come from. If anyone can help, it would be appreciated.
Cheers, -melanie
[1] https://github.com/openstack/placement/blob/bd5b19c00e1ab293fc157f4699bc4f47... [2] https://github.com/openstack/placement/blob/bd5b19c00e1ab293fc157f4699bc4f47... [3] https://paste.opendev.org/show/809636
On Mon Sep 27 2021 21:30:54 GMT-0700 (Pacific Daylight Time), Ghanshyam Mann <gmann@ghanshyammann.com> wrote:
Yeah. To find out who is corrupting the PATH_INFO and SCRIPT_NAME, I tried to trace the environ (could not finish it) in webob layer and while building the Request in webob, environ variables are already corrupted.
'PATH_INFO': '//resource_providers' 'REQUEST_URI': '/placement/resource_providers?in_tree=baba07a8-3c17-474c-82e6-5d63308a1f9d', 'SCRIPT_NAME': '/placemen',
Below log is from webob.des.py->__call__method - https://paste.opendev.org/show/809639/
There is no change in webob recently and as it is only failing in placement (nova, keystone request works fine) so webob or wsgi is less doubtful and it might be only in placement server-side in our Request/Response objects ?
I looked at this some more and I think I found where the environ gets populated in wsgiref.simple_server [1][2][3][4] in the case of devstack. (pbr generates the WSGI run script which ends up at /usr/local/bin/placement-api in a devstack env and this script uses wsgiref.simple_server as the WSGI server). I can't tell whether there is anything that could now be wrong in [3] or [4] given the recent apache upgrade in focal that Ian found is connected to the change in behavior (thank you Ian!). Maybe someone else can spot something. -melanie [1] https://github.com/openstack/placement/blob/bd5b19c00e1ab293fc157f4699bc4f47... [2] https://github.com/openstack/pbr/blob/2297988a7f41b8cb9d03aaea7d5ad9ba4378e5... [3] https://github.com/openstack/pbr/blob/2297988a7f41b8cb9d03aaea7d5ad9ba4378e5... [4] https://github.com/python/cpython/blob/3.8/Lib/wsgiref/simple_server.py#L85
On Mon, Sep 27, 2021 at 06:24:18PM -0700, melanie witt wrote:
In short, the WSGI environment dict is giving a bad/not matching url in the PATH_INFO variable if the call path is not simply '/'.
This definitely seems to be due to a recent apache ugprade in focal. If you pick out the prior versions $ wget http://launchpadlibrarian.net/547306461/apache2-utils_2.4.41-4ubuntu3.4_amd6... $ wget http://launchpadlibrarian.net/547306448/apache2-data_2.4.41-4ubuntu3.4_all.d... $ wget http://launchpadlibrarian.net/547306458/apache2-bin_2.4.41-4ubuntu3.4_amd64.... $ wget http://launchpadlibrarian.net/547306455/apache2_2.4.41-4ubuntu3.4_amd64.deb $ dpkg -i apache2*.deb $ service apache2 restart it "just works". $ /usr/local/bin/nova-status --debug --config-file /etc/nova/nova.conf upgrade check +-------------------------------------------+ | Upgrade Check Results | +-------------------------------------------+ ... | Check: Placement API | | Result: Success | | Details: None | +-------------------------------------------+ The recent security update @ https://launchpad.net/ubuntu/focal/+source/apache2/+changelog does a lot of stuff. It is an open question if this is actually an Apache error, or the update has just triggered something in our uwsgi setup that is wrong but just happened to work previously... -i
On 9/28/21 7:27 AM, Ian Wienand wrote: ...
It is an open question if this is actually an Apache error, or the update has just triggered something in our uwsgi setup that is wrong but just happened to work previously...
Actually mostly the latter I would say. See my proposed fix/workaround [0], the bug report I opened [1] and the apache2 docs [2]. The patch that changed apache2 behaviour is [3]. [0] https://review.opendev.org/c/openstack/devstack/+/811303 [1] https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/1945274 [2] https://httpd.apache.org/docs/trunk/mod/mod_proxy.html#proxypass [3] https://git.launchpad.net/ubuntu/+source/apache2/tree/debian/patches/CVE-202...
On Tue, Sep 28, 2021 at 03:27:23PM +1000, Ian Wienand wrote:
The recent security update @
https://launchpad.net/ubuntu/focal/+source/apache2/+changelog
does a lot of stuff.
frickler was actually onto this before me; see [1]. In short, it looks like we have found an issue with our ProxyPass config and the way it uses trailing slashes. See [2] We can either do ProxyPass "/placement/" "unix:/var/run/uwsgi/placement-api.socket|uwsgi://uwsgi-uds-placement-api/" retry=0 (note "placement/" and then "uwsgi://uwsgi-uds-placement-api/") or ProxyPass "/placement" "unix:/var/run/uwsgi/placement-api.socket|uwsgi://uwsgi-uds-placement-api" retry=0 both appear to work. I'm not sure which is better? We could enforce either in the function. As of right now 811303 is still testing, but it's looking good. -i [1] https://meetings.opendev.org/irclogs/%23openstack-qa/%23openstack-qa.2021-09... [2] https://review.opendev.org/c/openstack/devstack/+/811303
On Tue, Sep 28 2021 at 05:19:29 PM +1000, Ian Wienand <iwienand@redhat.com> wrote:
On Tue, Sep 28, 2021 at 03:27:23PM +1000, Ian Wienand wrote:
The recent security update @
https://launchpad.net/ubuntu/focal/+source/apache2/+changelog
does a lot of stuff.
frickler was actually onto this before me; see [1].
In short, it looks like we have found an issue with our ProxyPass config and the way it uses trailing slashes. See [2]
We can either do
ProxyPass "/placement/" "unix:/var/run/uwsgi/placement-api.socket|uwsgi://uwsgi-uds-placement-api/" retry=0
(note "placement/" and then "uwsgi://uwsgi-uds-placement-api/") or
ProxyPass "/placement" "unix:/var/run/uwsgi/placement-api.socket|uwsgi://uwsgi-uds-placement-api" retry=0
both appear to work. I'm not sure which is better? We could enforce either in the function. As of right now 811303 is still testing, but it's looking good.
Thank you all for troubleshooting and finding the solution. I think we need to use the '/placement' variant to allow GET http://<ip>/placement to be routed to the placement app. As far as I see if we use '/placement/' variant then /placement is not routed and that would be a breaking API change. Cheers, gibi
-i
[1] https://meetings.opendev.org/irclogs/%23openstack-qa/%23openstack-qa.2021-09... [2] https://review.opendev.org/c/openstack/devstack/+/811303
Updates: placement API URL (uwsgi setup bug#1945359) issue fixes are merged now, please recheck your failing patches. Big thanks to frickler, ianw, clarkb, and melwitt. Nova two jobs still failing on different issues and fixes are under testing: 1. "nova-ceph-multistore job is broken on stable branches" - https://bugs.launchpad.net/devstack-plugin-ceph/+bug/1945358 - Fixes are under testing: https://review.opendev.org/q/I7061f8d1491ff957452c9c777e40186a4e9c324e 2. nova-grenade-multinode job failing on master, stable/xena due to neutron-truck service/extension - https://bugs.launchpad.net/grenade/+bug/1945346 - Fixes are under testing: https://review.opendev.org/q/topic:%22bug%252F1945346%22+(status:open%20OR%2...) -gmann ---- On Tue, 28 Sep 2021 02:28:26 -0500 Balazs Gibizer <balazs.gibizer@est.tech> wrote ----
On Tue, Sep 28 2021 at 05:19:29 PM +1000, Ian Wienand <iwienand@redhat.com> wrote:
On Tue, Sep 28, 2021 at 03:27:23PM +1000, Ian Wienand wrote:
The recent security update @
https://launchpad.net/ubuntu/focal/+source/apache2/+changelog
does a lot of stuff.
frickler was actually onto this before me; see [1].
In short, it looks like we have found an issue with our ProxyPass config and the way it uses trailing slashes. See [2]
We can either do
ProxyPass "/placement/" "unix:/var/run/uwsgi/placement-api.socket|uwsgi://uwsgi-uds-placement-api/" retry=0
(note "placement/" and then "uwsgi://uwsgi-uds-placement-api/") or
ProxyPass "/placement" "unix:/var/run/uwsgi/placement-api.socket|uwsgi://uwsgi-uds-placement-api" retry=0
both appear to work. I'm not sure which is better? We could enforce either in the function. As of right now 811303 is still testing, but it's looking good.
Thank you all for troubleshooting and finding the solution.
I think we need to use the '/placement' variant to allow GET http://<ip>/placement to be routed to the placement app. As far as I see if we use '/placement/' variant then /placement is not routed and that would be a breaking API change.
Cheers, gibi
-i
[1] https://meetings.opendev.org/irclogs/%23openstack-qa/%23openstack-qa.2021-09... [2] https://review.opendev.org/c/openstack/devstack/+/811303
participants (5)
-
Balazs Gibizer
-
Ghanshyam Mann
-
Ian Wienand
-
Jens Harbott
-
melanie witt