On 12/3/18 10:48, Jay Pipes wrote:
On 12/03/2018 11:38 AM, Matt Riedemann wrote:
Questions came up in review [1] about dropping an old "nova-status upgrade check" which relies on using the in-tree placement database models for testing the check. The check in question, "Resource Providers", compares the number of compute node resource providers in the nova_api DB against the number of compute nodes in all cells. When the check was originally written in Ocata [2] it was meant to help ease the upgrade where nova-compute needed to be configured to report compute node resource provider inventory to placement so the scheduler could use placement. It looks for things like >0 compute nodes but 0 resource providers indicating the computes aren't reporting into placement like they should be. In Ocata, if that happened, and there were older compute nodes (from Newton), then the scheduler would fallback to not use placement. That fallback code has been removed. Also in Ocata, nova-compute would fail to start if nova.conf wasn't configured for placement [3] but that has also been removed. Now if nova.conf isn't configured for placement, I think we'll just log an exception traceback but not actually fail the service startup, and the node's resources wouldn't be available to the scheduler, so you could get NoValidHost failures during scheduling and need to dig into why a given compute node isn't being used during scheduling.
I say remove the check. Its usefulness is negligible compared to the effort that would be required to maintain it. It certainly isn't worth writing a whole new placement feature to replace the db access. And using existing interfaces would be very heavy in large deployments. (Not that that's a show-stopper for running an upgrade check, but still.) -efried
The question is, given this was added in Ocata to ease with the upgrade to require placement, and we're long past that now, is the check still useful? The check still has lots of newton/ocata/pike comments in it, so it's showing its age. However, one could argue it is still useful for base install verification, or for someone doing FFU. If we keep this check, the related tests will need to be re-written to use the placement REST API fixture since the in-tree nova_api db tables will eventually go away because of extracted placement.
The bigger question is, what sort of criteria do we have for dropping old checks like this besides when the related code, for which the check was added, is removed?
I'm not sure there is any "standard" criteria other than evaluating each migration in the way you've done above and then removing the code that is past its useful life (due to the code touching now-irrelevant parts of code as you describe above for the placement-related checks).
Best, -jay