[openstack-dev] [nova] placement/resource providers update 13

Chris Dent cdent+os at anticdent.org
Fri Mar 3 15:52:23 UTC 2017


This week's resource providers/placement update operates only
slightly as a summary of placement-related activity at least week's
PTG. We had a big etherpad of topics

     https://etherpad.openstack.org/p/nova-ptg-pike-placement

and an entire afternoon (plus some extra time elsewhen) to cover
them, but really only addressed three of them (shared resource
handling, custom resource classes, traits) in any significant
fashion, touching on nested-resource providers a bit in the room and
claims in the placement service on the etherpad. Some summation of
that below.

# What Matters Most

A major outcome from the discussion was that the can_host/shared
concept will not be used for dealing with shared resources (such as
shared disk). Instead a resource provider that is a compute node
will be identified by the fact that it has a trait (the actual value
to be determined). When the compute node creates or updates its own
resource provider it will add that trait. When the nova-scheduler
asks for resources to filter it will include that trait.

This means that the traits spec (and its POC) is now priority one in
the placement universe:

     https://review.openstack.org/#/c/345138/

# What's Changed

## Ironic Inventory

There was some debate about where in the layering of code within
nova-compute that the creation of custom resource classes should be
handled. Having these is necessary for the effective management of
ironic nodes. The discussion resulted in this new version of ironic
inventory handling:

     https://review.openstack.org/#/c/437602/

## Nested Resource Providers

There was some discussion at a flipboard about the concept of
resource providers with multiple parents. We eventually decided
"let's not do that". There was also some vague discussion about
whether it was possible to express a hardware configuration that is
currently planned to be expressed as nested resource providers as
instead a custom resource class, along the lines of how bare metal
configurations are described. This was left unresolved, in part
because presumably hardware configuration is dynamic in some or
many cases.

# Main Themes

## Traits

There's been a decision to normalize trait names so they look a bit
more like custom resource classes. That work is at

     https://review.openstack.org/#/q/status:open+project:openstack/os-traits+branch:master+topic:normalize

This is being done concurrently with the spec and code for traits
within placement/nova:

     https://review.openstack.org/#/q/status:open+topic:bp/resource-provider-traits
     https://review.openstack.org/#/q/status:open+topic:bp/resource-provider-tags

(That topic mismatch needs to be fixed.)

## Shared Resource Providers

As mentioned above, the plan on this work has changed, thus there is
currently no code in flight for it, but there is a blueprint:

     https://blueprints.launchpad.net/nova/+spec/shared-resources-pike

## Nested Resource Providers

https://review.openstack.org/#/q/status:open+topic:bp/nested-resource-providers

## Docs

https://review.openstack.org/#/q/topic:cd/placement-api-ref

The start of creating an API ref for the placement API. Not a lot
there yet as I haven't had much of an opportunity to move it along.
There is, however, enough there for additional content to be
started, if people have the opportunity to do so. Check with me to
divvy up the work if you'd like to contribute.

## Claims in the Scheduler

We intended to talk about this at the PTG but we didn't get to it.
There was some discussion on the etherpad (linked above) but the
consensus was that planning for how to do this while the service
was a) still evolving, b) only just starting to do filtering was
premature: Anything we try to plan now will likely be wrong or at
least not aligned with eventual discoveries. We decided, instead,
that the right thing to do was to make what we've got immediately
planned work correctly and to get some real return on the promise of
the placement API (which in the immediate sense means getting shared
disk managed effectively).

## Performance

Another topic we didn't get to. We're aware that there are some
redundancies in the resource tracker that we'd like to clean up

     http://lists.openstack.org/pipermail/openstack-dev/2017-January/110953.html

but it's also the case that we've done no performance testing on the
placement service itself. For example, consider the case where a
CERN-sized cloud is turned on (at Ocata) for the first time. Once
all the nodes have registered themselves as resource providers the
first request for some candidate destinations in the filter
scheduler will get back all those resource providers. That's
probably a waste on several dimensions and will get a bit loady.

We ought to model both these exterme cases and the common cases to
make sure there aren't unexpected performance drains.

## Microversion Handling on the Nova side

Matt identified that we'll need to be more conscious of micoversions
in nova-status, the scheduler and the resource tracker for pike and
beyond:

     https://bugs.launchpad.net/nova/+bug/1669433

# Other Code/Specs

* https://bugs.launchpad.net/nova/+bug/1635182
   Fixing it so we don't have to add json_error_formatter everywhere.
   There's a collection of related fixes attached to that bug report.

   Pushkar, you might want to make all of those have the same topic,
   or put them in a stack of related changes.

* https://review.openstack.org/#/q/status:open+topic:valid_inventories
   Fixes that ensure that we only accept valid inventories when setting
   them.

* https://review.openstack.org/#/c/416751/
   Removing the Allocation.create() method which was only ever used in
   tests and not in the actual, uh, creation of allocations.

* https://review.openstack.org/#/c/427330/
   Avoid deprecation warnings from oslo_context.

* https://review.openstack.org/#/q/topic:bp/delete-inventories-placement-api
   We need to be able to delete all the inventory hosted by one
   resource provider in one request. Right now you need one delete for
   each class of resource.

* https://review.openstack.org/#/c/418393/
   A spec for improving the level of detail and structure in placement
   error responses so that it is easier to distinguish between
   different types of, for example, 409 responses.

* https://review.openstack.org/#/c/423872/
   Spec for versioned-object based notification of events in the
   placement API.

* https://review.openstack.org/#/c/392891/
   CORS support in the placement API. We'll need this for browser-side
   clients.

* https://review.openstack.org/#/c/382613/
   A little demo script for showing how a cronjob to update inventory
   on a shared resource provider might work. This has been around for a
   long time, I created it because it seemed like having a sort of demo
   would be good, but it's been sitting around for a long time. It may
   not be aligned with what we need. If so I'd like to abandon it.

* https://review.openstack.org/#/c/435539/
   Update placement dev to indicate the new decorator for the
   json_error_formatter improvements mentioned above.

* https://review.openstack.org/#/c/436773/
   Removing SQL from an exception message.

* https://bugs.launchpad.net/nova/+bug/1661312
   Race condition for allocations during evacuation. Known bug, not
   sure of solution.

* https://bugs.launchpad.net/nova/+bug/1632852
   Cache headers not produced by placement API. This was assigned to
   several different people over time, but I'm not sure if there is
   any active code.

* https://etherpad.openstack.org/p/placement-newton-leftovers
   There's still some lingering stuff on here, some of which is
   mentioned elsewhere in this message, but not all.

I suspect I'm missing some items, please let me know.

# End Matter

I think we can think of at least the start of this cycle as a period
of consolidation for the placement service: Making sure that
everything we've started is finished, working accurately, and
returning benefits before making the next great leaps forward. These
leaps include things like:

* claims in the service
* neutron and cinder doing things with placement
* using a different database with placement [1]
* extracting placement to its own repo

[1] Patch to use separate database is being kept up to date:
     https://review.openstack.org/#/c/362766/

-- 
Chris Dent                 ¯\_(ツ)_/¯           https://anticdent.org/
freenode: cdent                                         tw: @anticdent


More information about the OpenStack-dev mailing list