[openstack-dev] [nova][placement] update_provider_tree design updates

Eric Fried openstack at fried.cc
Thu Mar 15 18:30:19 UTC 2018

One of the takeaways from the Queens retrospective [1] was that we
should be summarizing discussions that happen in person/hangout/IRC/etc.
to the appropriate mailing list for the benefit of those who weren't
present (or paying attention :P ).  This is such a summary.

As originally conceived, ComputeDriver.update_provider_tree was intended
to be the sole source of truth for traits and aggregates on resource
providers under its purview.

Then came the idea of reflecting compute driver capabilities as traits
[2], which would be done outside of update_provider_tree, but still
within the bounds of nova compute.

Then Friday discussions at the PTG [3] brought to light the fact that we
need to honor traits set by outside agents (operators, other services
like neutron, etc.), effectively merging those with whatever the virt
driver sets.  Concerns were raised about how to reconcile overlaps, and
in particular how compute (via update_provider_tree or otherwise) can
know if a trait is safe to *remove*.  At the PTG, we agreed we need to
do this, but deferred the details.

...which we discussed earlier this week in IRC [4][5].  We concluded:

- Compute is the source of truth for any and all traits it could ever
assign, which will be a subset of what's in os-traits, plus whatever
CUSTOM_ traits it stakes a claim to.  If an outside agent sets a trait
that's in that list, compute can legitimately remove it.  If an outside
agent removes a trait that's in that list, compute can reassert it.
- Anything outside of that list of compute-owned traits is fair game for
outside agents to set/unset.  Compute won't mess with those, ever.
- Compute (and update_provider_tree) will therefore need to know what
that list comprises.  Furthermore, it must take care to use merging
logic such that it only sets/unsets traits it "owns".
- To facilitate this on the compute side, ProviderTree will get new
methods to add/remove provider traits.  (Technically, it could all be
done via update_traits [6], which replaces the entire set of traits on a
provider, but then every update_provider_tree implementation would have
to write the same kind of merging logic.)
- For operators, we'll need OSC affordance for setting/unsetting
provider traits.

And finally:
- Everything above *also* applies to provider aggregates.  NB: Here
there be tygers.  Unlike traits, the comprehensive list of which can
conceivably be known a priori (even including CUSTOM_*s), aggregate
UUIDs are by their nature unique and likely generated dynamically.
Knowing that you "own" an aggregate UUID is relatively straightforward
when you need to set it; but to know you can/must unset it, you need to
have kept a record of having set it in the first place.  A record that
persists e.g. across compute service restarts.  Can/should virt drivers
write a file?  If so, we better make sure it works across upgrades.  And
so on.  Ugh.  For the time being, we're kinda punting on this issue
until it actually becomes a problem IRL.

And now for the moment you've all been awaiting with bated breath:
- Delta [7] to the update_provider_tree spec [8].
- Patch for ProviderTree methods to add/remove traits/aggregates [9].
- Patch modifying the update_provider_tree docstring, and adding devref
content for update_provider_tree [10].

Please feel free to email or reach out in #openstack-nova if you have
any questions.


[1] https://etherpad.openstack.org/p/nova-queens-retrospective (L122 as
of this writing)
[2] https://review.openstack.org/#/c/538498/
[3] https://etherpad.openstack.org/p/nova-ptg-rocky (L496-502 aotw)
[7] https://review.openstack.org/552122
[9] https://review.openstack.org/553475
[10] https://review.openstack.org/553476

More information about the OpenStack-dev mailing list