[openstack-dev] [nova] placement/resource providers update
mriedem at linux.vnet.ibm.com
Fri Nov 11 16:24:24 UTC 2016
On 11/11/2016 6:59 AM, Chris Dent wrote:
> I thought I would share some updates on the state of things related
> to placement and resource providers. There's almost certainly things
> missing from this, so if you know something important that you think
> should be mentioned please make a response including it. The point
> of this message is simply so people can have some idea of what's in
> play at the moment.
> Since spec freeze is the 17th, the stuff that presumably matters
> most in the list of reviews below are the specs. There are several.
> There's quite a lot of pending code changes too. The sooner that
> stuff merges the less conflicts it will cause later.
> # Leftovers from Newton
> These are the things which either should have been done in Newton
> and weren't or cleanups of stuff that was done but need some
> revision. There is an etherpad tracking these things where if you
> have interest and time you can pick up some things to do. There's
> been some excellent contribution from outside the usual suspects.
> Things that are ready to review:
> * Improved 404 responses:
> * Increased gabbi test coverage (stack of 3):
> * Proper handling of max_unit in resource tracker and placement API
> (stack of 2):
> * Aggregates support in placement api (stack of 3):
> * Demo inventory update script:
> This one might be considered a WIP because how it chooses to do
> things (rather simply and dumbly) may not be in line with expecations.
> * CORS support in placement API:
> * Cleaning up the newton resource providers spec to reflect reality:
> Except for the demo script none of that should be particular
> There are still quite a few things to pick up on the etherpad.
>  https://etherpad.openstack.org/p/placement-newton-leftovers
> # Filtering compute nodes with the placement API
> Now that the placement API is tracking compute node inventory and
> allocations against that inventory it becomes possible to use the
> api to shrink the number of compute nodes that nova-scheduler
> filters. Sylvain has started the work on that.
> * Spec for modifying the scheduler to query the api:
> * Spec for modifying the placement api to return resource providers
> that match requirements:
> * Code that satisfies those two specs (stack of 2, POC):
> The main area of disagreement on this stuff has been how to form the
> API for requesting and returning a list of resource providers.
> That's in the second patch of the stack above.
> # Custom Resource Classes
> Custom resource classes provide a mechanism for tracking volumes of
> inventory which are unique to a deployment. There's both spec and
> code in flight for this:
> * The spec
> * Code to make them work in the api (stack of 4):
> There's a lot of already merged code that establish the
> ResourceClass and ResourceClassList objects
> Custom resource classes are important for, amongst other things,
> being able to effectively manage different bare metal resources.
> # Nested Resource Providers
> Nested resource providers allow hierarchical or containing
> relationships in resource providers so it is possible to say things
> like "this portion of this device on this compute node" (think
> * The spec
> * Code to implement the object and HTTP API changes (stack of 4):
> # Allocations for generic PCI devices
> Changes to the resource tracker to allow simpe PCI devices to be
> * Code (stack of 3):
> # Important stuff not in other categories
> This section is for lose ends that don't fit in elsewhere. Stuff
> we're going to need to figure out at some point.
> ## Placement DB
> The long term plan is for there to be a separate placement database.
> One way to ease the migration to that is to allow people to go ahead
> and have a placement db from the outset (instead of using the API
> db). There's an etherpad that discusses this and some code
> that implements it but it's not clear that all the bases are covered
> or even need to be. It may be that more discussion is required or it
> may simply be that someone needs to untangle to mess and state the
> decision clearly.
>  https://etherpad.openstack.org/p/placement-optional-db-spec
>  https://review.openstack.org/#/c/362766/ (this is -2d pending
> resolution of the stuff on the etherpad)
> ## Placement Docs
> They need to happen. We know. Just need to find the time and
> resources. Listing it here so it is clear it is a known thing.
This might be easier to swallow in chunks if we have some topics that
need discussing, such as:
- deploying it, what is needed for this? (maybe link to the devstack
patch that added it as a reference)
- when is it needed? answer: during newton, before ocata
- docs on the actual REST API
- docs on the microversions / history and how to make a request with a
- maybe a little high-level background on how nova is using this in the
resource tracker, and then links to reference specs
If we had a list of topics built up can we start working the changes in
a series so it's easier to digest.
> ## Placement Upgrade/Installation issues
> When newton went out, the running of the placement service was set as
> optional. If the placement service was present in the service catalog
> then the resource tracker would send inventory and allocation data to it
> (in addition to the pre-existing tracking). If not, only the "old way"
> would happen.
> The intention is that the placement service will be required in an
> Ocata deployment which effectively means that a Newton deployment
> will need to turn it on before upgrading so that inventory and
> allocation tracking is up to date before switching over to Ocata.
So to be clear, it's optional when upgrading from mitaka to newton. But
you need to deploy it during newton before upgrading to ocata.
So a few things we need to do, and are related:
1. We need to get the placement-api enabled by default in the ocata CI
jobs, and figure out what that means for grenade.
2. If we're going to put something in place to break nova on upgrade to
ocata if you don't have the placement service running, then we need to
sort out what that looks like as it will impact that first item above
about grenade. With cellsv2 we're doing this with a database migration,
but that doesn't map the same way for the placement service. I mean, we
could check the API DB for resource provider information, but that seems
like cheating if you have a separate placement DB - but if we know that
information then we might be able to do that check.
3. It's important for new REST API changes in the placement service in
ocata to use microversions because today the upgrade flow is:
* conductor > API > computes
So a newton compute can be making requests to an ocata placement API and
the placement API therefore can't make backward incompatible changes, it
has to use microversions.
With cells v2 the upgrade goal is this:
* within a cell, upgrade conductor and then computes, do that cell by
cell, and when all cells are upgraded then upgrade the API. So in that
scenario ocata computes can be making requests to newton placement API.
Because of this, the RT client code in ocata making requests is going to
need to handle version discovery.
> dansmith has suggested there ought to be a tool which allows people
> to check their install to see if is ready for ocata, and if not,
> tells them what's missing. This will be useful not just for
> placement but for other things as well. Lots of changes in progress.
It might be interesting to get some feedback from operators on how much
they'd like to see a tool like this created and then we could POC
something after we figure out how we're going to block the ocata upgrade
if the placement service isn't deployed.
> This section is here for visibility and information. We need to work
> out the details.
> # End
> Thanks for reading this far. If you have anything to add or have
> questions please post a response.
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev