[openstack-dev] [nova] [placement] placement update 18-24
Chris Dent
cdent+os at anticdent.org
Fri Jun 15 15:04:27 UTC 2018
HTML: https://anticdent.org/placement-update-18-24.html
This is placement update 18-24, a weekly update of ongoing
development related to the [OpenStack](https://www.openstack.org/)
[placement
service](https://developer.openstack.org/api-ref/placement/).
It's been quite a while since the last one, mostly because of
travel, but also because coming to grips with the placement universe
takes some time. Catching up will mean that this update is likely to
be a bit long. Bear with it. This is obviously an _expand_ style
update (where we add new stuff). Next week will be a _contract_.
One thing I'd like to highlight is that with the merge of change
[560459](https://review.openstack.org/#/c/560459/) we've hit a long
promised milestone with placement. Thanks to an initial hit by Eric
Fried and considerable followups by Bhagyashri Shewale, we now have
rudimentary support in nova for libvirt-using compute nodes that
use shared disk to accurately report and claim that disk. Using it
requires some currently manual set up for the resource provider
associated with the disk and creating the aggregate of that disk
with the compute nodes that use it. But: this is one of the earliest
promises provided by the placement concept, in the works for more
than two years by many different people, finally showing up. Open
the bubbly or something, a light celebration is in order.
The flip side of this is that it highlights that we have a growing
documentation debt with the many features provided by placement and
how to make best use of them in nova (and other services that might
like to use placement). Before the end of the cycle we will need to
be sure that we set aside a considerable chunk of time to address
this gap.
# Most Important
Getting nested providers and consumer generations working are still
the key pieces of work. See the links in the themes below.
A lot of complicated work is in progress or recently merged and we
are getting deeper into the cycle. There are going to be bugs. The
sooner we get stuff merged so it has time to interact and we have
time to experiment with it the better. And there's also that
documentation gap mentioned above.
Also a reminder that for blueprints that have code that is ready for
wide review, put it on the
[runway](https://etherpad.openstack.org/p/nova-runways-rocky).
# What's Changed
(This is rather long because of the gap since the last report, but
also because we've hit a point where lots of stuff can merge.)
Discussion revealed an issue with allocations and inventory that
exists on a top-level resource provider which we'd later like to
move to a nested provider. An example is VGPU inventory which, until
sometime very soon, was represented as inventory on the compute
node (I think). Fixing this should be an atomic operation so a spec
is in progress for [Handling Reshaped Provider
Trees](https://review.openstack.org/#/c/572583/). This suggests a
new `/migrator` URI in the placement service, and for the sake of
fast-forward-upgrades, a way to reach that URI from a within-process
placement service (rather than over HTTP). The
[PlacementDirect](https://review.openstack.org/#/c/572576/) tool has
been created to allow this and has merged. Quite a lot of work will
need to be done to implement that spec, so I'm going to add it as a
theme (below).
Nova now requires the 1.25 placement microversion. It will go up
again soon.
The groundwork for consumer generations (including requiring
some form of project and user on all allocations) has merged. What
remains is exposing it all at the API layer.
The placement version discovery document was incomplete, causing
trouble for certain ways of using the openstacksdk. This has [been
fixed](https://review.openstack.org/#/c/575117/).
Placement now supports granular policy (policy per URI) in-code,
with customization possible via a policy file.
A potential 500 when listing usage information has been fixed.
There is now a [heal allocations
CLI](https://review.openstack.org/#/c/565886/) which is designed to
help people migrate away from the CachingScheduler (which doesn't
use placement).
Nova host aggregates are now magically mirrored as placement
aggregates and, amongst other things, this is used to honor the
[availability_zone hint via
placement](https://review.openstack.org/#/c/546282/).
# Bugs
* Placement related [bugs not yet in
progress](https://goo.gl/TgiPXb): 16, same as last time, but a
different set of bugs.
* [In progress placement bugs](https://goo.gl/vzGGDQ) 9, -1 on last
time.
# Specs
Total four weeks ago: 13. Now: 13
Spec-freeze has passed, so presumably exceptions will be required
for these. There's already a notional exception for "Reshaped
Provider Trees".
* <https://review.openstack.org/#/c/549067/>
VMware: place instances on resource pool (using update_provider_tree)
* <https://review.openstack.org/#/c/552924/>
Proposes NUMA topology with RPs
* <https://review.openstack.org/#/c/544683/>
Account for host agg allocation ratio in placement
* <https://review.openstack.org/#/c/552105/>
Support default allocation ratios
* <https://review.openstack.org/#/c/438640/>
Spec on preemptible servers
* <https://review.openstack.org/#/c/555081/>
Standardize CPU resource tracking
* <https://review.openstack.org/#/c/509042/>
Propose counting quota usage from placement
* <https://review.openstack.org/#/c/560174/>
Add history behind nullable project_id and user_id
* <https://review.openstack.org/#/c/565730/>
Placement: any traits in allocation_candidate query
* <https://review.openstack.org/#/c/565741/>
Placement: support mixing required traits with any traits
* <https://review.openstack.org/#/c/559718/>
[WIP] Support Placement in Cinder
* <https://review.openstack.org/#/c/572583/>
Handling Reshaped Provider Trees
* <https://review.openstack.org/#/c/569011/>
Count quota based on resource class
# Main Themes
"Mirror nova host aggregates to placement" and "Granular" are done,
so no longer listed as a theme. "Reshaped Provider Trees" is added
because we're stuck if we don't do it.
## Nested providers in allocation candidates
Quite a bit of the work related to nested providers in allocation
candidates has merged. What remains is on this topic:
* <https://review.openstack.org/#/q/topic:placement-return-all-resources>
Eric noticed that in this process we've injected some changes in
behavior in Rocky in the response to /allocation_candidates without
guarding it by microversion changes. There's [some
discussion](http://eavesdrop.openstack.org/irclogs/%23openstack-placement/%23openstack-placement.2018-06-14.log.html#t2018-06-14T16:53:06)
about it in IRC. First with me and then later with Jay. The gist is
that it's unfortunate that happened, but it's not a disaster and the
best outcome is that the diff between Queens and Rocky demonstrates
the right behavior.
## Consumer Generations
This allows multiple agents to "safely" update allocations for a
single consumer. The code is in progress:
* <https://review.openstack.org/#/q/topic:bp/add-consumer-generation>
As noted above, much of this is merged. Most of what is left is
exposing the functionality at the API level.
## Reshaped Provider Trees
This allows moving inventory and allocations that were on resource
provider A to resource provider B in an atomic fashion. Right now
this is a spec on the following topic:
* <https://review.openstack.org/#/q/topic:bp/reshape-provider-tree>
A glance at the spec will reveal that this is a multi-faceted and
multi-party effort. Nine people are listed in the Assignee section.
The placement direct part merged today.
# Extraction
The placement [db
connection](https://review.openstack.org/#/c/362766/) change has
been previously +W but since had a few merge conflicts. It
presumably will merge soon. This will allow installations to
optionally use a separate database for placement data. When that
merges a [zuul](https://review.openstack.org/#/c/564067/) change to
use it will adjust the nova-next job. The changes required to
devstack are already in place.
A stack of changes to placement unit tests to make them not rely on
nova.test has merged. There are functional tests remaining which
still use that. If you are looking for extraction-related work,
finding ways in which nova code is imported but isn't really needed
is a good way to make progress.
A while back, Jay made a first pass at an
[os-resource-classes](https://github.com/jaypipes/os-resource-classes/),
which needs some additional eyes on it. I personally thought it
might be heavier than required. If you have ideas please share them.
The placement extraction [forum
session](https://etherpad.openstack.org/p/YVR-placement-extraction)
went well. There was pretty good consensus from the people in the
room and we got some useful feedback from some operators on how
things ought to work.
An area we will need to prepare for is dealing with the various
infra and co-gating issues that will come up once placement is
extracted.
# Other
19 entries four weeks ago. 23 now.
Some of the older items in this list are not getting much attention.
That's a shame. The list is ordered (oldest first) the way it is on
purpose.
* <https://review.openstack.org/#/c/546660/>
Purge comp_node and res_prvdr records during deletion of
cells/hosts
* <https://review.openstack.org/#/q/topic:bp/placement-osc-plugin-rocky>
A huge pile of improvements to osc-placement
* <https://review.openstack.org/#/c/527791/>
Get resource provider by uuid or name (osc-placement)
* <https://review.openstack.org/#/c/477478/>
placement: Make API history doc more consistent
* <https://review.openstack.org/#/c/556669/>
Handle agg generation conflict in report client
* <https://review.openstack.org/#/c/537614/>
Add unit test for non-placement resize
* <https://review.openstack.org/#/c/493865/>
cover migration cases with functional tests
* <https://review.openstack.org/#/q/topic:bug/1732731>
Bug fixes for sharing resource providers
* <https://review.openstack.org/#/c/535517/>
Move refresh time from report client to prov tree
* <https://review.openstack.org/#/c/561770/>
PCPU resource class
* <https://review.openstack.org/#/c/566166/>
rework how we pass candidate request information
* <https://review.openstack.org/#/c/564876/>
add root parent NULL online migration
* <https://review.openstack.org/#/q/topic:bp/bandwidth-resource-provider>
add resource_requests field to RequestSpec
* <https://review.openstack.org/#/c/575127/>
replace deprecated accept.best_match
* <https://review.openstack.org/#/c/575222/>
Don't heal allocations for deleted servers
* <https://review.openstack.org/#/c/575237/>
Ignore UserWarning for scope checks during test runs
* <https://review.openstack.org/#/c/568965/>
Enforce placement minimum in nova.cmd.status
* <https://review.openstack.org/#/c/560107/>
normalize_name helper (in os-traits)
* <https://review.openstack.org/#/c/573475/>
Fix nits in nested provider allocation candidates(2)
* <https://review.openstack.org/#/c/538498/>
Convert driver supported capabilities to compute node provider
traits
* <https://review.openstack.org/#/c/568639/>
Use placement.inventory.inuse in report client
* <https://review.openstack.org/#/c/517921/>
ironic: Report resources as reserved when needed
* <https://review.openstack.org/#/c/568713/>
Test for multiple limit/group_policy qparams
# End
Yow. That was long. Thanks for reading. Review some code please.
--
Chris Dent ٩◔̯◔۶ https://anticdent.org/
freenode: cdent tw: @anticdent
More information about the OpenStack-dev
mailing list