Open Stack

Mon May 28 12:31:59 UTC 2018

On Fri, May 25, 2018 at 12:19 AM, Matt Riedemann <mriedemos at gmail.com>
wrote:

> I've written a nova-manage placement heal_allocations CLI [1] which was a
> TODO from the PTG in Dublin as a step toward getting existing
> CachingScheduler users to roll off that (which is deprecated).
>
> During the CERN cells v1 upgrade talk it was pointed out that CERN was
> able to go from placement-per-cell to centralized placement in Ocata
> because the nova-computes in each cell would automatically recreate the
> allocations in Placement in a periodic task, but that code is gone once
> you're upgraded to Pike or later.
>
> In various other talks during the summit this week, we've talked about
> things during upgrades where, for instance, if placement is down for some
> reason during an upgrade, a user deletes an instance and the allocation
> doesn't get cleaned up from placement so it's going to continue counting
> against resource usage on that compute node even though the server instance
> in nova is gone. So this CLI could be expanded to help clean up situations
> like that, e.g. provide it a specific server ID and the CLI can figure out
> if it needs to clean things up in placement.
>
> So there are plenty of things we can build into this, but the patch is
> already quite large. I expect we'll also be backporting this to stable
> branches to help operators upgrade/fix allocation issues. It already has
> several things listed in a code comment inline about things to build into
> this later.
>
> My question is, is this good enough for a first iteration or is there
> something severely missing before we can merge this, like the automatic
> marker tracking mentioned in the code (that will probably be a non-trivial
> amount of code to add). I could really use some operator feedback on this
> to just take a look at what it already is capable of and if it's not going
> to be useful in this iteration, let me know what's missing and I can add
> that in to the patch.
>
> [1] https://review.openstack.org/#/c/565886/
>
>

It does sound for me a good way to help operators.

That said, given I'm now working on using Nested Resource Providers for
VGPU inventories, I wonder about a possible upgrade problem with VGPU
allocations. Given that :
 - in Queens, VGPU inventories are for the root RP (ie. the compute node
RP), but,
 - in Rocky, VGPU inventories will be for children RPs (ie. against a
specific VGPU type), then

if we have VGPU allocations in Queens, when upgrading to Rocky, we should
maybe recreate the allocations to a specific other inventory ?

Hope you see the problem with upgrading by creating nested RPs ?

> --
>
> Thanks,
>
> Matt
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180528/ba022734/attachment.html>

Open Stack

[openstack-dev] [Openstack-operators] [nova] Need some feedback on the proposed heal_allocations CLI

OpenStack

Community

Documentation

Branding & Legal