[nova] When can/should we remove old nova-status upgrade checks?
Matt Riedemann
mriedemos at gmail.com
Mon Dec 3 16:38:01 UTC 2018
Questions came up in review [1] about dropping an old "nova-status
upgrade check" which relies on using the in-tree placement database
models for testing the check. The check in question, "Resource
Providers", compares the number of compute node resource providers in
the nova_api DB against the number of compute nodes in all cells. When
the check was originally written in Ocata [2] it was meant to help ease
the upgrade where nova-compute needed to be configured to report compute
node resource provider inventory to placement so the scheduler could use
placement. It looks for things like >0 compute nodes but 0 resource
providers indicating the computes aren't reporting into placement like
they should be. In Ocata, if that happened, and there were older compute
nodes (from Newton), then the scheduler would fallback to not use
placement. That fallback code has been removed. Also in Ocata,
nova-compute would fail to start if nova.conf wasn't configured for
placement [3] but that has also been removed. Now if nova.conf isn't
configured for placement, I think we'll just log an exception traceback
but not actually fail the service startup, and the node's resources
wouldn't be available to the scheduler, so you could get NoValidHost
failures during scheduling and need to dig into why a given compute node
isn't being used during scheduling.
The question is, given this was added in Ocata to ease with the upgrade
to require placement, and we're long past that now, is the check still
useful? The check still has lots of newton/ocata/pike comments in it, so
it's showing its age. However, one could argue it is still useful for
base install verification, or for someone doing FFU. If we keep this
check, the related tests will need to be re-written to use the placement
REST API fixture since the in-tree nova_api db tables will eventually go
away because of extracted placement.
The bigger question is, what sort of criteria do we have for dropping
old checks like this besides when the related code, for which the check
was added, is removed? FFU kind of throws a wrench in everything, but at
the same time, I believe the prescribed FFU steps are that online data
migrations (and upgrade checks) are meant to be run per-release you're
fast-forward upgrading through.
[1]
https://review.openstack.org/#/c/617941/26/nova/tests/unit/cmd/test_status.py
[2] https://review.openstack.org/#/c/413250/
[3]
https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L1139
--
Thanks,
Matt
More information about the openstack-discuss
mailing list