[nova] When can/should we remove old nova-status upgrade checks?
Balázs Gibizer
balazs.gibizer at ericsson.com
Tue Dec 4 09:00:43 UTC 2018
On Mon, Dec 3, 2018 at 5:38 PM, Matt Riedemann <mriedemos at gmail.com>
wrote:
> Questions came up in review [1] about dropping an old "nova-status
> upgrade check" which relies on using the in-tree placement database
> models for testing the check. The check in question, "Resource
> Providers", compares the number of compute node resource providers in
> the nova_api DB against the number of compute nodes in all cells.
> When the check was originally written in Ocata [2] it was meant to
> help ease the upgrade where nova-compute needed to be configured to
> report compute node resource provider inventory to placement so the
> scheduler could use placement. It looks for things like >0 compute
> nodes but 0 resource providers indicating the computes aren't
> reporting into placement like they should be. In Ocata, if that
> happened, and there were older compute nodes (from Newton), then the
> scheduler would fallback to not use placement. That fallback code has
> been removed. Also in Ocata, nova-compute would fail to start if
> nova.conf wasn't configured for placement [3] but that has also been
> removed. Now if nova.conf isn't configured for placement, I think
> we'll just log an exception traceback but not actually fail the
> service startup, and the node's resources wouldn't be available to
> the scheduler, so you could get NoValidHost failures during
> scheduling and need to dig into why a given compute node isn't being
> used during scheduling.
>
> The question is, given this was added in Ocata to ease with the
> upgrade to require placement, and we're long past that now, is the
> check still useful? The check still has lots of newton/ocata/pike
> comments in it, so it's showing its age. However, one could argue it
> is still useful for base install verification, or for someone doing
> FFU. If we keep this check, the related tests will need to be
> re-written to use the placement REST API fixture since the in-tree
> nova_api db tables will eventually go away because of extracted
> placement.
I'm OK to remove the check as during FFU one can install Rocky version
of nova to run the check if needed. Anyhow if there is a need to keep
the check, then I think we can change the implementation to read the
hostname of each compute from the HostMapping and query the placement
API with that hostname as a RP name then check that there is VCPU
inventory at least on that RP.
Cheers,
gibi
>
> The bigger question is, what sort of criteria do we have for dropping
> old checks like this besides when the related code, for which the
> check was added, is removed? FFU kind of throws a wrench in
> everything, but at the same time, I believe the prescribed FFU steps
> are that online data migrations (and upgrade checks) are meant to be
> run per-release you're fast-forward upgrading through.
>
> [1]
> https://review.openstack.org/#/c/617941/26/nova/tests/unit/cmd/test_status.py
> [2] https://review.openstack.org/#/c/413250/
> [3]
> https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L1139
>
> --
>
> Thanks,
>
> Matt
>
More information about the openstack-discuss
mailing list