[nova] When can/should we remove old nova-status upgrade checks?

Matt Riedemann mriedemos at gmail.com
Mon Dec 3 16:38:01 UTC 2018


Questions came up in review [1] about dropping an old "nova-status 
upgrade check" which relies on using the in-tree placement database 
models for testing the check. The check in question, "Resource 
Providers", compares the number of compute node resource providers in 
the nova_api DB against the number of compute nodes in all cells. When 
the check was originally written in Ocata [2] it was meant to help ease 
the upgrade where nova-compute needed to be configured to report compute 
node resource provider inventory to placement so the scheduler could use 
placement. It looks for things like >0 compute nodes but 0 resource 
providers indicating the computes aren't reporting into placement like 
they should be. In Ocata, if that happened, and there were older compute 
nodes (from Newton), then the scheduler would fallback to not use 
placement. That fallback code has been removed. Also in Ocata, 
nova-compute would fail to start if nova.conf wasn't configured for 
placement [3] but that has also been removed. Now if nova.conf isn't 
configured for placement, I think we'll just log an exception traceback 
but not actually fail the service startup, and the node's resources 
wouldn't be available to the scheduler, so you could get NoValidHost 
failures during scheduling and need to dig into why a given compute node 
isn't being used during scheduling.

The question is, given this was added in Ocata to ease with the upgrade 
to require placement, and we're long past that now, is the check still 
useful? The check still has lots of newton/ocata/pike comments in it, so 
it's showing its age. However, one could argue it is still useful for 
base install verification, or for someone doing FFU. If we keep this 
check, the related tests will need to be re-written to use the placement 
REST API fixture since the in-tree nova_api db tables will eventually go 
away because of extracted placement.

The bigger question is, what sort of criteria do we have for dropping 
old checks like this besides when the related code, for which the check 
was added, is removed? FFU kind of throws a wrench in everything, but at 
the same time, I believe the prescribed FFU steps are that online data 
migrations (and upgrade checks) are meant to be run per-release you're 
fast-forward upgrading through.

[1] 
https://review.openstack.org/#/c/617941/26/nova/tests/unit/cmd/test_status.py
[2] https://review.openstack.org/#/c/413250/
[3] 
https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L1139

-- 

Thanks,

Matt



More information about the openstack-discuss mailing list