[nova] When can/should we remove old nova-status upgrade checks?

Balázs Gibizer balazs.gibizer at ericsson.com
Tue Dec 4 09:00:43 UTC 2018



On Mon, Dec 3, 2018 at 5:38 PM, Matt Riedemann <mriedemos at gmail.com> 
wrote:
> Questions came up in review [1] about dropping an old "nova-status 
> upgrade check" which relies on using the in-tree placement database 
> models for testing the check. The check in question, "Resource 
> Providers", compares the number of compute node resource providers in 
> the nova_api DB against the number of compute nodes in all cells. 
> When the check was originally written in Ocata [2] it was meant to 
> help ease the upgrade where nova-compute needed to be configured to 
> report compute node resource provider inventory to placement so the 
> scheduler could use placement. It looks for things like >0 compute 
> nodes but 0 resource providers indicating the computes aren't 
> reporting into placement like they should be. In Ocata, if that 
> happened, and there were older compute nodes (from Newton), then the 
> scheduler would fallback to not use placement. That fallback code has 
> been removed. Also in Ocata, nova-compute would fail to start if 
> nova.conf wasn't configured for placement [3] but that has also been 
> removed. Now if nova.conf isn't configured for placement, I think 
> we'll just log an exception traceback but not actually fail the 
> service startup, and the node's resources wouldn't be available to 
> the scheduler, so you could get NoValidHost failures during 
> scheduling and need to dig into why a given compute node isn't being 
> used during scheduling.
> 
> The question is, given this was added in Ocata to ease with the 
> upgrade to require placement, and we're long past that now, is the 
> check still useful? The check still has lots of newton/ocata/pike 
> comments in it, so it's showing its age. However, one could argue it 
> is still useful for base install verification, or for someone doing 
> FFU. If we keep this check, the related tests will need to be 
> re-written to use the placement REST API fixture since the in-tree 
> nova_api db tables will eventually go away because of extracted 
> placement.

I'm OK to remove the check as during FFU one can install Rocky version 
of nova to run the check if needed. Anyhow if there is a need to keep 
the check, then I think we can change the implementation to read the 
hostname of each compute from the HostMapping and query the placement 
API with that hostname as a RP name then check that there is VCPU 
inventory at least on that RP.

Cheers,
gibi

> 
> The bigger question is, what sort of criteria do we have for dropping 
> old checks like this besides when the related code, for which the 
> check was added, is removed? FFU kind of throws a wrench in 
> everything, but at the same time, I believe the prescribed FFU steps 
> are that online data migrations (and upgrade checks) are meant to be 
> run per-release you're fast-forward upgrading through.
> 
> [1] 
> https://review.openstack.org/#/c/617941/26/nova/tests/unit/cmd/test_status.py
> [2] https://review.openstack.org/#/c/413250/
> [3] 
> https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L1139
> 
> --
> 
> Thanks,
> 
> Matt
> 




More information about the openstack-discuss mailing list