[openstack-dev] [nova] placement/resource providers update 33
Chris Dent
cdent+os at anticdent.org
Fri Aug 25 12:54:56 UTC 2017
Here's update 33.
RC2 went to the presses. The result is that we've now got claims
happening earlier and using better information. This ought to mean
that there are fewer retries and failed builds. There's some
cruftiness in the code that manages allocations that will need to be
cleaned up, and bugs and buglets keep getting found in some edge cases
but overall much forward progress. Nice work everyone.
One alternate destinations is done, the next things coming up are
getting shared providers working on the nova side, incorporating
traits in resource requests, and, eventually, nested resource providers.
Presumably at the PTG we'll decide the if/when/how of extracting
placement to its own repo.
This week I've added a section that references bugs that have not yet
seen much action.
# Most Important
Besides reviewing all the stuff in this document, another important
thing to do is to make additions and edits on the PTG etherpad (see
help wanted).
The ongoing work with allocation related functional tests (many listed
below), and the getting alternate destinations working is also
important.
# Help Wanted
There's a swathe of placement related stuff on the PTG planning
etherpad. Please add to that or make some adjustments if you think
something is missing or incomplete:
https://etherpad.openstack.org/p/nova-ptg-queens
An important aspect of this is determining what kind of dependency
tree is involved with the work.
Also see this new next section.
# Bugs needing attention
(Bugs which are not yet in progress or beyond.)
## Current
* https://bugs.launchpad.net/nova/+bug/1712411
Allocations may not be removed from dest node during failed migrations
* https://bugs.launchpad.net/nova/+bug/1679750
Allocations are not cleaned up in placement for instance 'local delete' case
* https://bugs.launchpad.net/nova/+bug/1427772
Instance that uses force-host still needs to run some filters
(old bug, but newly relevant in a placement world)
## Old (need to be flushed or refreshed:?)
* https://bugs.launchpad.net/nova/+bug/1683858
Allocation records do not contain overhead information
* https://bugs.launchpad.net/nova/+bug/1652099
placement requests from n-cpu logs not found in placement-api logs
* https://bugs.launchpad.net/nova/+bug/1674694
In placement api error responses choose poor default content-type
(this was partially fixed in the resource tracker, but not generally.
as described in the bug, this ought to be relatively straightforward
to make go)
* https://bugs.launchpad.net/nova/+bug/1662867
update_available_resource_for_node racing instance deletion
(Is this one still relevant after all the recent changes to claim
handling?)
# Docs
There's a stack that documents (with visual aids!) the flow of
scheduler and placement. It is pretty much ready:
https://review.openstack.org/#/c/475810/
# Main Themes
## Alternate Destinations
There's a stack beginning at https://review.openstack.org/#/c/486215/
which proposes the bits necessary to return alternate destinations
besides the claimed destination. These will be used to do within-cell
(v2) retries in case a build can't be done on the claimed destiantion.
The spec revision for that work: https://review.openstack.org/#/c/471927/
Ed has some concerns about the complexity being created, so he wrote
up some issues at:
https://blog.leafe.com/handling-unstructured-data/
In his response to https://review.openstack.org/#/c/495854/3 Jay
suggests a named tuple:
I'm struck that instead of a two-tuple, both elements of the tuple
having lists of lists, would it not be clearer to have the return
value from select_destinations() instead be a single list of
namedtuple elements, where the namedtuple would have a
chosen_host, alternate_hosts, and allocation_requests attribute
## Traits
Work continues apace on getting filtering by traits working:
https://review.openstack.org/#/c/489206/
This has some overlap with shared provider handling (below).
## Shared Resource Providers
There's some support for shared resource providers on the placement
side of the scheduling equation, but the resource tracker is not yet
ready to support it. There is some work in progress, starting with
functional tests:
https://review.openstack.org/#/c/490733/
## Nested Resource Providers
This will start back up after we clean off the windscreen. The stack
begins at https://review.openstack.org/#/c/470575/5
# Other Code
* https://review.openstack.org/#/c/493865/
functional tests for live migrate
* https://review.openstack.org/#/c/494136/
Allow shuffling of best weighted hosts
* https://review.openstack.org/#/c/495159/
tests for resource allocation during soft delete
* https://review.openstack.org/#/c/485209/
gabbi tests for shared custom resource class
* https://review.openstack.org/#/c/495891/
WIP: test allocation handling during scheduler retry
* https://review.openstack.org/#/c/480379/
ensure RP maps to those RPs that share with it
This is a requirement for getting shared providers working
correctly.
* https://review.openstack.org/#/c/496853/
Spec for minimal cache-headers in placement
poc: https://review.openstack.org/#/c/495380/
* https://review.openstack.org/#/c/469048/
Update the placement deployment instructions
This has been around for nearly 4 months.
* https://review.openstack.org/#/c/489633/
Update RT aggregate map less frequently
* https://review.openstack.org/#/c/494206/
Remove the Pike migration code for flavor migration
* https://review.openstack.org/#/c/468797/
Spec for requesting traits in flavors
* https://review.openstack.org/#/c/496933/
Add uuid to migration table
(This is relevant to placement and scheduling because it ought to
make the "doubling" currently used for doing moves cleaner (by
having two different allocations: one identified by the migration
uuid. Aren't uuids awesome?)
* https://review.openstack.org/#/c/428481/
Request zero root disk for boot-from-volume instances
(Relevant for making sure that disk allocations are correct.)
* https://review.openstack.org/#/c/452006/
Add functional test for two-cell scheduler behaviors
* https://review.openstack.org/#/c/496202/
Add functional migrate force_complete test
* https://review.openstack.org/#/c/497399/
WIP: Test server movings with custom resources
* https://review.openstack.org/#/c/497733/
WIP spec Report CPU features to placement service by traits API
* https://review.openstack.org/#/c/496976/
Centralize allocation deletion in ComputeManager
* https://review.openstack.org/#/c/496803/
Add missing unit tests for FilterScheduler._get_all_host_states
* https://review.openstack.org/#/c/496847/
Add missing tests for _remove_deleted_instances_allocations
* https://review.openstack.org/#/c/492247/
Use ksa adapter for placement conf & requests
* https://review.openstack.org/#/c/492571/
Make compute log less verbose with allocs autocorrection
* https://review.openstack.org/#/c/496936/
De-duplicate two delete_allocation_for_* methods
--
Chris Dent (⊙_⊙') https://anticdent.org/
freenode: cdent tw: @anticdent
More information about the OpenStack-dev
mailing list