[openstack-dev] [nova] placement/resource providers update 33

Chris Dent cdent+os at anticdent.org
Fri Aug 25 12:54:56 UTC 2017


Here's update 33.

RC2 went to the presses. The result is that we've now got claims
happening earlier and using better information. This ought to mean
that there are fewer retries and failed builds. There's some
cruftiness in the code that manages allocations that will need to be
cleaned up, and bugs and buglets keep getting found in some edge cases
but overall much forward progress. Nice work everyone.

One alternate destinations is done, the next things coming up are
getting shared providers working on the nova side, incorporating
traits in resource requests, and, eventually, nested resource providers.

Presumably at the PTG we'll decide the if/when/how of extracting
placement to its own repo.

This week I've added a section that references bugs that have not yet
seen much action.

# Most Important

Besides reviewing all the stuff in this document, another important
thing to do is to make additions and edits on the PTG etherpad (see
help wanted).

The ongoing work with allocation related functional tests (many listed
below), and the getting alternate destinations working is also
important.

# Help Wanted

There's a swathe of placement related stuff on the PTG planning
etherpad. Please add to that or make some adjustments if you think
something is missing or incomplete:

      https://etherpad.openstack.org/p/nova-ptg-queens

An important aspect of this is determining what kind of dependency
tree is involved with the work.

Also see this new next section.

# Bugs needing attention

(Bugs which are not yet in progress or beyond.)

## Current

* https://bugs.launchpad.net/nova/+bug/1712411
   Allocations may not be removed from dest node during failed migrations

* https://bugs.launchpad.net/nova/+bug/1679750
   Allocations are not cleaned up in placement for instance 'local delete' case

* https://bugs.launchpad.net/nova/+bug/1427772
   Instance that uses force-host still needs to run some filters
   (old bug, but newly relevant in a placement world)

## Old (need to be flushed or refreshed:?)

* https://bugs.launchpad.net/nova/+bug/1683858
   Allocation records do not contain overhead information

* https://bugs.launchpad.net/nova/+bug/1652099
   placement requests from n-cpu logs not found in placement-api logs

* https://bugs.launchpad.net/nova/+bug/1674694
   In placement api error responses choose poor default content-type
   (this was partially fixed in the resource tracker, but not generally.
   as described in the bug, this ought to be relatively straightforward
   to make go)

* https://bugs.launchpad.net/nova/+bug/1662867
   update_available_resource_for_node racing instance deletion
   (Is this one still relevant after all the recent changes to claim
   handling?)

# Docs

There's a stack that documents (with visual aids!) the flow of
scheduler and placement. It is pretty much ready:

     https://review.openstack.org/#/c/475810/

# Main Themes

## Alternate Destinations

There's a stack beginning at https://review.openstack.org/#/c/486215/
which proposes the bits necessary to return alternate destinations
besides the claimed destination. These will be used to do within-cell
(v2) retries in case a build can't be done on the claimed destiantion.

The spec revision for that work: https://review.openstack.org/#/c/471927/

Ed has some concerns about the complexity being created, so he wrote
up some issues at:

     https://blog.leafe.com/handling-unstructured-data/

In his response to https://review.openstack.org/#/c/495854/3 Jay
suggests a named tuple:

     I'm struck that instead of a two-tuple, both elements of the tuple
     having lists of lists, would it not be clearer to have the return
     value from select_destinations() instead be a single list of
     namedtuple elements, where the namedtuple would have a
     chosen_host, alternate_hosts, and allocation_requests attribute

## Traits

Work continues apace on getting filtering by traits working:

       https://review.openstack.org/#/c/489206/

This has some overlap with shared provider handling (below).

## Shared Resource Providers

There's some support for shared resource providers on the placement
side of the scheduling equation, but the resource tracker is not yet
ready to support it. There is some work in progress, starting with
functional tests:

      https://review.openstack.org/#/c/490733/

## Nested Resource Providers

This will start back up after we clean off the windscreen. The stack
begins at https://review.openstack.org/#/c/470575/5

# Other Code

* https://review.openstack.org/#/c/493865/
   functional tests for live migrate

* https://review.openstack.org/#/c/494136/
   Allow shuffling of best weighted hosts

* https://review.openstack.org/#/c/495159/
   tests for resource allocation during soft delete

* https://review.openstack.org/#/c/485209/
   gabbi tests for shared custom resource class

* https://review.openstack.org/#/c/495891/
   WIP: test allocation handling during scheduler retry

* https://review.openstack.org/#/c/480379/
   ensure RP maps to those RPs that share with it
   This is a requirement for getting shared providers working
   correctly.

* https://review.openstack.org/#/c/496853/
   Spec for minimal cache-headers in placement
   poc: https://review.openstack.org/#/c/495380/

* https://review.openstack.org/#/c/469048/
   Update the placement deployment instructions
   This has been around for nearly 4 months.

* https://review.openstack.org/#/c/489633/
   Update RT aggregate map less frequently

* https://review.openstack.org/#/c/494206/
   Remove the Pike migration code for flavor migration

* https://review.openstack.org/#/c/468797/
   Spec for requesting traits in flavors

* https://review.openstack.org/#/c/496933/
   Add uuid to migration table
   (This is relevant to placement and scheduling because it ought to
   make the "doubling" currently used for doing moves cleaner (by
   having two different allocations: one identified by the migration
   uuid. Aren't uuids awesome?)

* https://review.openstack.org/#/c/428481/
   Request zero root disk for boot-from-volume instances
   (Relevant for making sure that disk allocations are correct.)

* https://review.openstack.org/#/c/452006/
   Add functional test for two-cell scheduler behaviors

* https://review.openstack.org/#/c/496202/
   Add functional migrate force_complete test

* https://review.openstack.org/#/c/497399/
   WIP: Test server movings with custom resources

* https://review.openstack.org/#/c/497733/
   WIP spec Report CPU features to placement service by traits API

* https://review.openstack.org/#/c/496976/
   Centralize allocation deletion in ComputeManager

* https://review.openstack.org/#/c/496803/
   Add missing unit tests for FilterScheduler._get_all_host_states

* https://review.openstack.org/#/c/496847/
   Add missing tests for _remove_deleted_instances_allocations

* https://review.openstack.org/#/c/492247/
   Use ksa adapter for placement conf & requests

* https://review.openstack.org/#/c/492571/
   Make compute log less verbose with allocs autocorrection

* https://review.openstack.org/#/c/496936/
   De-duplicate two delete_allocation_for_* methods

-- 
Chris Dent                      (⊙_⊙')         https://anticdent.org/
freenode: cdent                                         tw: @anticdent


More information about the OpenStack-dev mailing list