[openstack-dev] [nova] Need some feedback on the proposed heal_allocations CLI
mriedemos at gmail.com
Thu May 24 22:19:49 UTC 2018
I've written a nova-manage placement heal_allocations CLI  which was
a TODO from the PTG in Dublin as a step toward getting existing
CachingScheduler users to roll off that (which is deprecated).
During the CERN cells v1 upgrade talk it was pointed out that CERN was
able to go from placement-per-cell to centralized placement in Ocata
because the nova-computes in each cell would automatically recreate the
allocations in Placement in a periodic task, but that code is gone once
you're upgraded to Pike or later.
In various other talks during the summit this week, we've talked about
things during upgrades where, for instance, if placement is down for
some reason during an upgrade, a user deletes an instance and the
allocation doesn't get cleaned up from placement so it's going to
continue counting against resource usage on that compute node even
though the server instance in nova is gone. So this CLI could be
expanded to help clean up situations like that, e.g. provide it a
specific server ID and the CLI can figure out if it needs to clean
things up in placement.
So there are plenty of things we can build into this, but the patch is
already quite large. I expect we'll also be backporting this to stable
branches to help operators upgrade/fix allocation issues. It already has
several things listed in a code comment inline about things to build
into this later.
My question is, is this good enough for a first iteration or is there
something severely missing before we can merge this, like the automatic
marker tracking mentioned in the code (that will probably be a
non-trivial amount of code to add). I could really use some operator
feedback on this to just take a look at what it already is capable of
and if it's not going to be useful in this iteration, let me know what's
missing and I can add that in to the patch.
More information about the OpenStack-dev