Open Stack

Mon May 2 03:07:09 UTC 2016

On 5/1/2016 6:46 PM, Matt Riedemann wrote:
> On Wednesday morning Jay Pipes led a double session on the work going on
> in the Nova scheduler. The session etherpad is here [1].
>
> Jay started off by taking us through a high-level overview of what was
> completed for the quantitative changes:
>
> 1. Resource classes
>
> 2. Resource providers
>
> 3. Online data migration of compute node RAM/CPU/disk inventory.
>
> Then Jay talked through the in-progress quantitative changes:
>
> 1. Online data migration of allocation fields for instances.
>
> 2. Modeling of generic resource pools for things like shared storage and
> IP subnet allocation pools (for Neutron routed networks).
>
> 3. Creating a separate placement REST API endpoint for generic resource
> pools.
>
> 4. Cleanup/refactor around PCI device handling.
>
> For the quantitative changes, we still need to migrate the PCI and NUMA
> fields.
>
> In the second scheduler session we got more into the capabilities
> (qualitative) part of the request. For example, this could be exposing
> hypervisor capabilities from the compute node through the API or for
> scheduling.
>
> Jay pointed out there have been several blueprints posted over time
> related to this.
>
> We discussed just what a capability is, i.e. is a hypervisor version a
> capability since certain features are only exposed with certain
> versions?  For libvirt it isn't really. For example, you can only set
> the admin password in a guest if you have libvirt >=1.2.16. But for
> hyper-v there are features which are supported by both older and newer
> versions of the hypervisor but are generally considered better or more
> robust in the newer version. But for some things, like supporting
> hyperv-gen2 instances, we consider that 'supports-hyperv-gen2' is the
> capability. We can further break that down by value enums if we need
> more granular information about the capability.
>
> While informative, we left the session with some unanswered next steps:
>
> 1. Where to put the inventories/allocations tables. We have three
> choices: API DB, Nova (cell) DB, or a new placement DB. Leaving them in
> the cell DB would mean aggregating data which would be a mess. Putting
> them in a new placement DB would mean a 2nd new database in a short
> number of releases for deployers to manage. So I believe we agreed to
> put them in the API DB (but if others think otherwise please speak up).
>
> 2. We talked for quite awhile about what a capability is and isn't, but
> I didn't come away with a definitive answer. This might get teased out
> in Claudiu's spec [2]. Note, however, that on Friday we agreed that as
> far as microversions are concerned, a new capability exposed in the REST
> API requires a microversion. But new enumerations for a capability, e.g.
> CPU features, do not require a new microversion bump, there are just too
> many of them.
>
> 3. I think we're in agreement on the blueprints/roadmap that Jay has put
> forth, but it's unclear if we have an owner for each blueprint. Jay and
> Chris Dent own several of these and some others are helping out (Dan
> Smith has been doing a lot of the online data migration patches), but we
> don't have owners for everything.
>
> 4. We need to close out any obsolete blueprints from the list that Jay
> had in the etherpad. As already noted, several of these are older and
> probably superseded by current work, so the team just needs flush these
> out.
>
> 5. Getting approval on the generic-resource-pools,
> resource-providers-allocations, standardizing capabilities (extra
> specs). The first two are pretty clear at this point, the specs just
> need to be rebased and reviewed in the next week. A lot of the code is
> already up for review. We're less clear on standardizing capabilities.
>
> --
>
> So the focus for the immediate future is going to have to be on
> completing the resource providers, inventory and allocation data
> migration code and generic resource pools. That's all the quantitative
> work and if we can push to get a lot of that done before the midcycle
> meetup it would be great, then we can see where we sit and discuss more
> about capabilities.
>
> Jay - please correct or add to anything above.
>
> [1] https://etherpad.openstack.org/p/newton-nova-scheduler
> [2] https://review.openstack.org/#/c/286520/
>

I forgot to mention that toward the end of the second scheduler session, 
Yingxin Cheng from Intel gave a short presentation [1] on some 
performance improvements he's seen with the 'eventually consistent' host 
shared state scheduler prototype.

He had a particularly interesting slide (7) with a performance 
comparison of the default filter scheduler configuration vs the caching 
scheduler vs the eventually consistent prototype scheduler. The latter 
two out-performed the default configuration in his testing.

A TODO from the presentation was for Yingxin to pre-load some of the 
computes used in the test and see how the prototype works with handling 
those pre-loaded computes.

[1] 
https://docs.google.com/presentation/d/1UG1HkEWyxPVMXseLwJ44ZDm-ek_MPc4M65H8EiwZnWs/edit?ts=571fcdd5#slide=id.g12d2cf15cd_2_90

-- 

Thanks,

Matt Riedemann

Open Stack

[openstack-dev] [nova] Austin summit scheduler session recap

OpenStack

Community

Documentation

Branding & Legal