Open Stack

Wed Nov 2 17:54:19 UTC 2016

We had three design summit sessions related to the new placement service 
and resource providers work. Since they are all more or less related, 
I'm going to recap them in a single email.

----

The first session was a retrospective on the placement service work that 
happened in the Newton release. The full etherpad is here:

https://etherpad.openstack.org/p/ocata-nova-summit-placement-retrospective

We first talked about what went well, of which there were many things:

- There is a better shared understanding of the design and goals among 
more people in Nova.
- Computes in Newton are reporting their RAM/DISK/CPU inventory and usage.
- We have CI jobs.
- Jay did a nice job of using consistent terminology when discussing 
resource providers and the end goal for Newton so we could stay focused.
- Hangouts helped the team get unstuck at times when we were grinding 
toward feature freeze.
- The placement API has a clean WSGI design and REST interface that 
others are able to build onto easily.

We then talked about what didn't go so well, which included:

- Confusion around division of labor and when different chunks can be 
worked in parallel, and by whom.
- There was too much time spent on making the specs perfect and we 
needed to just start writing and reviewing code. This was especially 
evident when the client side (resource tracker) pieces started getting 
written that used the placement REST API and required changes to the API.
- At times there were key discussions/decisions that were not properly 
documented/communicated back to the wider team.
- There was a breakdown in communication at or after the midcycle about 
the separate placement DB which led to a revert late in the cycle.
- General burnout and frustration.
- Traps of working on long patch series with little review feedback 
early in the series or low-latency on reviews leading to wasted time.

 From those discussions, we listed what we should keep doing or do 
differently:

- Write specs with so less low-level detail, but if there is that level 
of detail, make sure to amend the spec later if there are changes once 
implemented.
- Use Hangouts when we get stuck.
- Document/communicate decisions/agreements/changes in direction in the 
mailing list.
- Encourage people to pair up for redundancy.
- Encourage early PoCs before building a long and potentially off the 
mark patch series.

There was also some general discussion about not moving specs to 
'implemented' until the spec is updated after the code is all approved. 
I was personally not sold on what was proposed for this, since I 
consider amending specs is like writing documentation and CI tests - if 
you don't -2 the last change in the series to complete the blueprint, 
people have little incentive to actually do it and once their code is 
merged it's very hard to get them to do the ancillary tasks. I'm open to 
further discussing this idea though in case I missed the point.

----

The next session was about the quantitative side of resource providers. 
The full etherpad is here:

https://etherpad.openstack.org/p/ocata-nova-summit-resource-providers-quantitative

There were quite a few things in the etherpad and we didn't get to all 
of them, so this is a recap of what we did talk about.

- Custom resource classes

The code for this is moving along and being reviewed. There will be 
namespaces on the standard resource classes that nova provides. The 
resource tracker will create inventory/allocation records for the Ironic 
nodes. The Ironic inventory records will use the node.resource_class 
value as the custom resource class.

We still need to figure out what to do about mapping a single flavor to 
multiple node classes, but it might just be done with extra_specs. There 
will be upgrade impacts for this, however, if not properly mapped and 
the scheduler starts using the placement service.

- Microversions

Chris Dent has a patch up to add microversion support to the placement 
API and it's being reviewed.

- Nested resource providers

Jay has been working on code for this and has a design in mind. Jay and 
Ed did some whiteboarding in the hall and sorted out their differences 
on the design and have agreement on the way forward (which is Jay's 
nesting/tree model).

- Documenting the placement REST API

We didn't get into this at the summit, but in side discussions it's a 
TODO and right now we'll most likely handle this like we do for the 
compute api-ref.

- Top priorities for Ocata

1. The scheduler calling the placement API to get a list of resource 
providers. There are some specs and WIP code up that Sylvain is working 
on. Note that this is not going to involve the caching scheduler for 
now, we'll worry about that later.

2. Start handling shared storage. We need the resource tracker and/or an 
external script to create the resource provider / aggregate mapping and 
inventory/allocation records against shared DISK_GB inventories. The 
aggregates mapping modeling work in the placement API is underway.

- What's required when upgrading to Ocata

1. The placement service is required to upgrade to Ocata. You'll break 
in Ocata if you don't have this because the scheduler will be using the 
placement service for scheduling decisions. The idea is to stand up the 
placement service in Newton, get the resource provider (compute node) 
data populated and then upgrade.

TODO: We need to be more clear about this in the release notes and 
upgrade docs.

2. The aforementioned mapping of Ironic flavors to multiple node 
resource classes. This is still a TBD though.

----

The final resource providers session focused on qualitative aspects, 
which are the traits on a given resource provider. The full session 
etherpad is here:

https://etherpad.openstack.org/p/ocata-nova-summit-resource-providers-qualitative

The majority of the session was mostly talking about the proposed traits 
REST API and different use cases, along with some clarification on rules 
around traits:

- They can't be negative.
- Preferred/required traits will be part of the request spec, not tagged 
on a trait itself. How this is worked into the request spec is TBD.
- Image metadata / flavor extra specs will need to be handled at some 
point but it's not a top priority right now.
- There will be no ACLs on traits.
- The traits APIs will be admin-only for now.

The direction for Ocata is to:

- Spend less time on the spec and start working on some proof of concept 
code, especially on the client side to help shape the needs of the REST API.
- Create a spec for namespaces on custom traits which will mirror how we 
handle namespaces for custom resource classes.
- Move the os-traits library under the Compute program wrt governance.

-- 

Thanks,

Matt Riedemann

Open Stack

[openstack-dev] [nova] placement / resource providers ocata summit session recaps

OpenStack

Community

Documentation

Branding & Legal