Re: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates'

7 Jun 2019

      On Fri, 2019-06-07 at 19:04 +0100, Sean Mooney wrote:
...
On Fri, 2019-06-07 at 09:05 -0500, Eric Fried wrote:
...
...
...
...
(a) Leave all traits alone. If they need to be removed, it would have to
be manually via a separate step.
(b) Support a new option so the caller can dictate whether the operation
should remove the traits. (This is all-or-none.)
(c) Define a "namespace" - a trait substring - and remove only traits in
that namespace.
I'm going to -1 (b). It's too big a hammer, at too big a cost (including
API changes).
...
If I’m not wrong, for last two approaches, we would need to change
RestFul APIs.
No, (c) does not. By "define a namespace" I mean we would establish a
naming convention for traits to be used with this feature. For example:
CUSTOM_AGGREGATE_ISOLATION_WINDOWS_LICENSE
i personaly dislike c as it means we cannot use any standard traits in host
aggrates.
Actually, it means we *can*, whereas (b) and (d) mean we *can't*. That's
the whole point. If we want to isolate our hyperthreading hosts, we put
them in an aggregate with HW_CPU_HYPERTHREADING on it. The sync here
should be a no-op because those hosts should already have
HW_CPU_HYPERTHREADING on them. And then if we decide to remove such a
host, or destroy the aggregate, or whatever, we *don't want*
HW_CPU_HYPERTHREADING to be removed from the providers, because they can
still do that.
in the cpu pinning spec we said HW_CPU_HYPERTHREADING was not going to be managed
by the virt driver so it wont be reported unless the admin manulaly adds it.
https://github.com/openstack/nova-specs/blob/master/specs/train/approved/cpu...
...
"The HW_CPU_HYPERTHREADING trait will need to be among the traits that the virt driver cannot always override, since
the
operator may want to indicate that a single NUMA node on a multi-NUMA-node host is meant for guests that tolerate
hyperthread siblings as dedicated CPUs."
so i was suggesting this was a way to enable that the operator to manage whic host report that trait
although as the spec suggest we may  want to report this differently per numa node which would still
require you to use osc-placment or some other way to set it manually.
...
(Unless you mean we can't make a standard trait that we can use for
isolation that gets (conditionally) removed in these scenarios? There's
nothing preventing us from creating a standard trait called
COMPUTE_AGGREGATE_ISOLATION_WINDOWS_LICENSE, which would work just the
same.)
im suggesting it woudl be nice to be able to use host aggates to manage statdard or custom traits on hosts that are
not managed by the driver thwer ethat is a COMPUTE_AGGREGATE_ISOLATION_WINDOWS_LICENSE trait or something
else. so i was hoping to make this feature more reusable for other usecase in the future. for example it would be nice
to be able to say this set of host has the CUSTOM_DPDK_NETWORKING trait by putting them in a host aggrage and then
adding a forbindent trait to my non hugepage backed guests.
...
...
there is also an option d. when you remove a trait form a host aggregate for each host in
the aggregate check if that traits exists on another aggregate the host is a member of and remove
it if not found on another aggregate.
Sorry I wasn't clear, (b) also does this ^ but with the condition that
also checks for the _AGGREGATE_ISOLATION_ infix.
...
for c you still have to deal with the fact a host can be in multiple
host aggrates too by
...
the way so jsut because a thread is namespace d and it is removed from
an aggrate does not
...
mean its correcct to remove it from a host.
Right - in reality, there should be one algorithm, idempotent, to sync
host RP traits when anything happens to aggregates. It always goes out
and does the appropriate {set} math to decide which traits should exist
on which hosts and effects any necessary changes.
And yes, the performance will suck in a large deployment, because we
have to get all the compute RPs in all the aggregates (even the ones
with no trait metadata) to do that calculation. But aggregate operations
are fairly rare, aren't they?
Perhaps this is where we provide a nova-manage tool to do (b)'s sync
manually (which we'll surely have to do anyway as a "heal" or for
upgrades). So if you're not using the feature, you don't suffer the penalty.
...
for example we talked about useing a hyperthreading trait in the cpu pinning spec which
will not be managed by the compute driver. host aggages would be a convient way
to be able to manage that trait if this was a generic feature.
Oh, okay, yeah, I don't accept this as a use case for this feature. It
will work, but we shouldn't recommend it precisely because it's
asymmetrical (you can't remove the trait by removing it from the
aggregate).
why not we do not expect the virt driver to report the hypertreading trait
since we said it can be extrenally managed. even if we allowed the virt drvier to
conditionally report it only when it frst creates a RP it is not allowed to readd
if it is remvoed by someone else.
...
There are other ways to add a random trait to all hosts in
an aggregate (for host in `get providers in aggregate`; do openstack
resource provider trait add ...; done).
But for the sake of discussion, what about:
(e) Fully manual. Aggregate operations never touch (add or remove)
traits on host RPs. You always have to do that manually. As noted above,
it's easy to do - and we could make it easier with a tiny wrapper that
takes an aggregate, a list of traits, and an --add/--remove command. So
initially, setting up aggregate isolation is a two-step process, and in
the future we can consider making new API/CLI affordance that combines
the steps.
ya e could work too.
melanie added a similar functionality to osc placment for managing the alloction ratios
of specific resource classes per aggregate a few months ago 
https://review.opendev.org/#/c/640898/
we could proably provide somthing similar for managing traits but determining what RP to
add the trait too would be a littel tricker. we would have to be able to filter to RP with either a
specific inventory or with a specific trait or in a speicic subtree.

you could have a --root or somthing to jsut say add or remove the tratit from the root RPs in an
aggregate. but yes you could certely automate this in a simile cli extention.
...
...
efried
.