[openstack-dev] [nova] placement / resource providers ocata summit session recaps
Matt Riedemann
mriedem at linux.vnet.ibm.com
Wed Nov 2 17:54:19 UTC 2016
We had three design summit sessions related to the new placement service
and resource providers work. Since they are all more or less related,
I'm going to recap them in a single email.
----
The first session was a retrospective on the placement service work that
happened in the Newton release. The full etherpad is here:
https://etherpad.openstack.org/p/ocata-nova-summit-placement-retrospective
We first talked about what went well, of which there were many things:
- There is a better shared understanding of the design and goals among
more people in Nova.
- Computes in Newton are reporting their RAM/DISK/CPU inventory and usage.
- We have CI jobs.
- Jay did a nice job of using consistent terminology when discussing
resource providers and the end goal for Newton so we could stay focused.
- Hangouts helped the team get unstuck at times when we were grinding
toward feature freeze.
- The placement API has a clean WSGI design and REST interface that
others are able to build onto easily.
We then talked about what didn't go so well, which included:
- Confusion around division of labor and when different chunks can be
worked in parallel, and by whom.
- There was too much time spent on making the specs perfect and we
needed to just start writing and reviewing code. This was especially
evident when the client side (resource tracker) pieces started getting
written that used the placement REST API and required changes to the API.
- At times there were key discussions/decisions that were not properly
documented/communicated back to the wider team.
- There was a breakdown in communication at or after the midcycle about
the separate placement DB which led to a revert late in the cycle.
- General burnout and frustration.
- Traps of working on long patch series with little review feedback
early in the series or low-latency on reviews leading to wasted time.
From those discussions, we listed what we should keep doing or do
differently:
- Write specs with so less low-level detail, but if there is that level
of detail, make sure to amend the spec later if there are changes once
implemented.
- Use Hangouts when we get stuck.
- Document/communicate decisions/agreements/changes in direction in the
mailing list.
- Encourage people to pair up for redundancy.
- Encourage early PoCs before building a long and potentially off the
mark patch series.
There was also some general discussion about not moving specs to
'implemented' until the spec is updated after the code is all approved.
I was personally not sold on what was proposed for this, since I
consider amending specs is like writing documentation and CI tests - if
you don't -2 the last change in the series to complete the blueprint,
people have little incentive to actually do it and once their code is
merged it's very hard to get them to do the ancillary tasks. I'm open to
further discussing this idea though in case I missed the point.
----
The next session was about the quantitative side of resource providers.
The full etherpad is here:
https://etherpad.openstack.org/p/ocata-nova-summit-resource-providers-quantitative
There were quite a few things in the etherpad and we didn't get to all
of them, so this is a recap of what we did talk about.
- Custom resource classes
The code for this is moving along and being reviewed. There will be
namespaces on the standard resource classes that nova provides. The
resource tracker will create inventory/allocation records for the Ironic
nodes. The Ironic inventory records will use the node.resource_class
value as the custom resource class.
We still need to figure out what to do about mapping a single flavor to
multiple node classes, but it might just be done with extra_specs. There
will be upgrade impacts for this, however, if not properly mapped and
the scheduler starts using the placement service.
- Microversions
Chris Dent has a patch up to add microversion support to the placement
API and it's being reviewed.
- Nested resource providers
Jay has been working on code for this and has a design in mind. Jay and
Ed did some whiteboarding in the hall and sorted out their differences
on the design and have agreement on the way forward (which is Jay's
nesting/tree model).
- Documenting the placement REST API
We didn't get into this at the summit, but in side discussions it's a
TODO and right now we'll most likely handle this like we do for the
compute api-ref.
- Top priorities for Ocata
1. The scheduler calling the placement API to get a list of resource
providers. There are some specs and WIP code up that Sylvain is working
on. Note that this is not going to involve the caching scheduler for
now, we'll worry about that later.
2. Start handling shared storage. We need the resource tracker and/or an
external script to create the resource provider / aggregate mapping and
inventory/allocation records against shared DISK_GB inventories. The
aggregates mapping modeling work in the placement API is underway.
- What's required when upgrading to Ocata
1. The placement service is required to upgrade to Ocata. You'll break
in Ocata if you don't have this because the scheduler will be using the
placement service for scheduling decisions. The idea is to stand up the
placement service in Newton, get the resource provider (compute node)
data populated and then upgrade.
TODO: We need to be more clear about this in the release notes and
upgrade docs.
2. The aforementioned mapping of Ironic flavors to multiple node
resource classes. This is still a TBD though.
----
The final resource providers session focused on qualitative aspects,
which are the traits on a given resource provider. The full session
etherpad is here:
https://etherpad.openstack.org/p/ocata-nova-summit-resource-providers-qualitative
The majority of the session was mostly talking about the proposed traits
REST API and different use cases, along with some clarification on rules
around traits:
- They can't be negative.
- Preferred/required traits will be part of the request spec, not tagged
on a trait itself. How this is worked into the request spec is TBD.
- Image metadata / flavor extra specs will need to be handled at some
point but it's not a top priority right now.
- There will be no ACLs on traits.
- The traits APIs will be admin-only for now.
The direction for Ocata is to:
- Spend less time on the spec and start working on some proof of concept
code, especially on the client side to help shape the needs of the REST API.
- Create a spec for namespaces on custom traits which will mirror how we
handle namespaces for custom resource classes.
- Move the os-traits library under the Compute program wrt governance.
--
Thanks,
Matt Riedemann
More information about the OpenStack-dev
mailing list