Open Stack

Fri Feb 3 19:29:36 UTC 2017

It's been a busy week and it is getting into my Friday evening, so
I'm not sure I can adequately summarize where things are in the
universe of resource providers and the placement API, so instead I'm
going to give a bit of summary of what we managed get into Ocata.

The effective summary is that we hit the main goal of getting the
nova-scheduler limiting the hosts it will filter by requesting a set
of resource providers from the placement API. This was not without
several difficulties, mostly dealing with ensuring safe upgrades.
The end result is that the scheduler will check to see that all
compute nodes are Ocata. If they are then limiting via the placement
API will happen. Otherwise the old method will be used.

The above may sound small, but an enormous amount of work happened,
especially in the last two weeks to catch lots of problems (big and
small), clear out confusions, and get things working with CI and a
variety of deployment scenarious. Thanks to everyone who helped out.

The concept of custom resource classes now exists in the placement
API but the work to have the resource tracker use it to track Ironic
inventory in a sensible fashion at

     https://review.openstack.org/#/c/404472/

is not likely to be included in Ocata. There's hope that some
mid-cycle jiggery pokery can be done to make it meaningfully
available in Pike.

Testing of all this stuff exposed a fairly significant bug when
compute nodes are deleted. This is fixed by

     https://review.openstack.org/#/c/428375/

I mention it here not because it was the only bug found (not in the
slightest) but because the discussion surrounding it (and trying to
understand the custom resource class changes) suggested there is a
lot we could and should be doing to clear up not just the code in
the compute manager and its resource tracker friend but also our
human understanding of that code[1]. We also know from

     http://lists.openstack.org/pipermail/openstack-dev/2017-January/110953.html

that sometimes the resource tracker is duplicating work more than we
might like, resulting in excessive requests to the placement API.

A main action from the discussions surrounding that confusion was
that we need much more complete functional (and probably integration
too, but the emphasis is on functional for the sake of developer
oriented short feedback loops) testing of the compute manager and
resource tracker, including stuff that exercises Ironic.  Which
leads to...

What's Next
===========

Besides fixing bugs that are going to come rolling in we've got:

* Evaluating and as necessary tidying up the many rushed fixes that
   we made there at the end.
* Adding the more complete functional testing.
* Building the scheduling side of the ironic custom resource classes
   (flavor extra_spec?) and fun things like
   "hash ring + placement-based scheduling + host aggregates / AZs"
* Making use of the aggregate map that the resource tracker now has.
* Figuring out how to make more effective use of tracked_instances
   dict in resource tracker.
* Making progress on the plan for making resource requests and claims
   simultaneously via the placement API.
* Nested resource providers.
* Resource traits.
* All the extra bits that people are working on or thinking about
   that usually show up in these reports.

Next week I will endeavor to gather enough information to make these
messages something actionable again. In the meantime thanks and
congrats to everyone for pushing things forward.

[1] This is probably a universal truth of any code, so perhaps
redundant to mention.

-- 
Chris Dent                 ¯\_(ツ)_/¯           https://anticdent.org/
freenode: cdent                                         tw: @anticdent

Open Stack

[openstack-dev] [nova] placement/resource providers update 10

OpenStack

Community

Documentation

Branding & Legal