[openstack-dev] [placement] update 18-38
Chris Dent
cdent+os at anticdent.org
Fri Sep 21 18:03:19 UTC 2018
HTML: https://anticdent.org/placement-update-18-38.html
Here's a placement update. Last week there wasn't one, because of
the PTG. There will be some references to various PTG stuff within
but since we haven't fully resolved what the priorities will be, the
discussion here will be somewhat unfocused.
# Most Important
Two main important things to do:
As is typical (at least in my experience), last week we discussed
and planned more work than anyone could be reasonably be expected to
accomplish in a few years, let alone a single cycle, so there will
be an inevitable winnowing and prioritizing of ideas and specs over
the next few days. There's some discussion of priorities on an
[etherpad](https://etherpad.openstack.org/p/nova-ptg-stein-priorities),
but the details of which to do and how to implement are not fully
resolved. Reviewing the specs (below) ought to help that.
We're still working towards a complete set of integration and
upgrade tests for the new placement repo. The unit and functional
tests are happy and nicely fast, but they aren't covering important
things like upgrading from placement-in-nova to just-placement, nor
do they do any live testing with a full devstack. Work is in
progress on all of this, see the "extraction" section below.
# What's Changed
We had a meeting to come up with a plan for migrating placement to
an independent project. Mel wrote up a [summary
email](http://lists.openstack.org/pipermail/openstack-dev/2018-September/134541.html)
with the steps.
# Questions and Links
(I've added "links" to this section because since there's a good one
this week, why not?)
* There was a demo at the PTG for the minimum bandwith work. That's
been written up in a [blog
post](https://rubasov.github.io/2018/09/21/openstack-qos-min-bw-demo.html).
* Yesterday, belmoreira showed up in [#openstack-placement](http://eavesdrop.openstack.org/irclogs/%23openstack-placement/%23openstack-placement.2018-09-20.log.html#t2018-09-20T14:11:59)
with some issues with expected resource providers not showing up
in allocation candidates. This was traced back to `max_unit` for
`VCPU` being locked at == `total` and hardware which had had SMT
turned off now reporting fewer CPUs, thus being unable to accept
existing large flavors. Discussion ensued about ways to
potentially make `max_unit` more manageable by operators. The
existing constraint is there for a reason (discussed in IRC) but
that reason is not universally agreed.
There are two issues with this: The "reason" is not universally
agreed and we didn't resolve that. Also, management of
`max_unit` of any inventory gets more complicated in a world of
complex NUMA topologies.
# Bugs
* Placement related [bugs not yet in progress](https://goo.gl/TgiPXb): 17.
No change (in number) from last time.
* [In progress placement bugs](https://goo.gl/vzGGDQ) 10. Same as
last time.
# Specs
New (or newly discovered) ones are at the end. Specs which have
merged have been removed. As stated above: We still haven't
solidified priorities, so some specs may merge as "low priority".
* <https://review.openstack.org/#/c/544683/>
Account for host agg allocation ratio in placement
(Still in rocky/)
* <https://review.openstack.org/#/c/595236/>
Add subtree filter for GET /resource_providers
* <https://review.openstack.org/#/c/597601/>
Resource provider - request group mapping in allocation candidate
* <https://review.openstack.org/#/c/549067/>
VMware: place instances on resource pool
(still in rocky/)
* <https://review.openstack.org/#/c/555081/>
Standardize CPU resource tracking
* <https://review.openstack.org/#/c/599957/>
Allow overcommit of dedicated CPU
(Has an alternative which changes allocations to a float)
* <https://review.openstack.org/#/c/600016/>
List resource providers having inventory
* <https://review.openstack.org/#/c/593475/>
Bi-directional enforcement of traits
* <https://review.openstack.org/#/c/599598/>
allow transferring ownership of instance
* <https://review.openstack.org/#/c/591037/>
Modelling passthrough devices for report to placement
* <https://review.openstack.org/#/c/509042/>
Propose counting quota usage from placement and API database
(A bit out of date but may be worth resurrecting)
* <https://review.openstack.org/#/c/603585/>
Spec: allocation candidates in tree
* <https://review.openstack.org/#/c/603805/>
[WIP] generic device discovery policy
* <https://review.openstack.org/#/c/603955/>
Nova Cyborg interaction specification.
* <https://review.openstack.org/#/c/601596/>
supporting virtual NVDIMM devices
* <https://review.openstack.org/#/c/603352/>
Spec: Support filtering by forbidden aggregate
* <https://review.openstack.org/#/c/552924/>
Proposes NUMA topology with RPs
* <https://review.openstack.org/#/c/552105/>
Support initial allocation ratios
(There are at least two pending allocation ratio handling cleanup
specs. It's not clear from the PTG etherpad which of these was
chosen as the future (we did choose, but the etherpad is
confusing). 544683 (above) is the other one.)
* <https://review.openstack.org/#/c/569011/>
Count quota based on resource class
# Main Themes
These are interim themes while we work out what priorities are.
## Making Nested Useful
An acknowledged outcome from the PTG was that we need to do the work
to make workloads that want to use nested resource providers
actually able to land on a host somewhere. This involves work across
many parts of nova and could easily lead to a mass of bug fixes in
placement. I'm probably missing a fair bit but the following topics
are good starting points:
* <https://review.openstack.org/#/q/topic:bp/use-nested-allocation-candidates>
* <https://review.openstack.org/#/q/topic:use-nested-allocation-candidates>
* <https://review.openstack.org/#/q/topic:bug/1792503>
## Consumer Generations
gibi is still working hard to drive home support for consumer
generations on the nova side. Because of some dependency management
that stuff is currently in the following topic:
* <https://review.openstack.org/#/q/topic:bp/use-nested-allocation-candidates>
## Extraction
As mentioned above, getting the extracted placement happy is
proceeding apace. Besides many of the generic cleanups happening [to
the
repo](https://review.openstack.org/#/q/project:openstack/placement+status:open)
we need to focus some effort on upgrade and integration testing,
docs publishing, and doc correctness.
Dan has started a [database migration
script](https://review.openstack.org/#/c/603234/) which will be used
by deployers and grenade for upgrades. Matt is hoping to make some
progress on the grenade side of things. I have a [hacked up
devstack](https://review.openstack.org/#/c/600162/) for using the
extracted placement.
All of this is dependent on:
* database migrations being "collapsed"
* the existence of a `placement-manage` script to initialize the
database
I made a faked up
[placement-manage](https://review.openstack.org/#/c/600161/) for the
devstack patch above, but it only creates tables, doesn't migrate,
and is not fit for purpose as a generic CLI.
I have started [some
experiments](https://review.openstack.org/#/c/601614/) on using
[gabbi-tempest](https://pypi.org/project/gabbi-tempest/) to drive
some integration tests for placement with solely gabbi YAML files. I
initially did this using "legacy" style zuul jobs, and made it work,
but it was ugly and I've since started using more modern zuul, but
haven't yet made it work.
# Other
As with last time, I'm not going to make a list of links to pending
changes that aren't already listed above. I'll start doing that again
eventually (once priorities are more clear), but for now it is
useful to look at [open placement
patches](https://review.openstack.org/#/q/project:openstack/placement+status:open)
and patches from everywhere which [mention placement in the commit
message](https://review.openstack.org/#/q/message:placement+status:open).
# End
In case anyone is wondering where I am, I'm out M-W next week.
--
Chris Dent ٩◔̯◔۶ https://anticdent.org/
freenode: cdent tw: @anticdent
More information about the OpenStack-dev
mailing list