[openstack-dev] [nova] A prototype implementation towards the "shared state scheduler"

Chris Dent cdent+os at anticdent.org
Sun Feb 21 21:43:28 UTC 2016

On Sun, 21 Feb 2016, Jay Pipes wrote:

> I don't see how the shared-state scheduler is getting the most accurate 
> resource view. It is only in extreme circumstances that the resource-provider 
> scheduler's view of the resources in a system (all of which is stored without 
> caching in the database) would differ from the "actual" inventory on a 
> compute node.

I'm pretty sure this ¶ is central to the whole discussion. It's a
question of where the final truth lies and what that positioning allows
and forbids. In resource-providers, the truth, or at least the truth
that is acted upon is in the database. In shared-state, the scheduler
mirrors the resources. People have biases about that sort of stuff.

Generalizing quite bit:

All that mirroring costs quite a bit in communication terms and can go
funky if the communication goes awry. But it does mean that the compute
nodes are authoritative about themselves and have the possibility of
using/claiming/placing resources that are not under control of the
scheduler (or even nova in general).

Centralizing things in the DB cuts way back on messaging and appears to
provide both a computationally and conceptually efficient way of
calculating placement but it does so at the cost of the compute nodes
have less flexibility about managing their own resources, unless we want
the failure mode you describe elsewhere to be more common than you

I heard somewhere, but this may be wrong or out of date, that one of the
constraints with compute-nodes is that it should be possible to spawn
VMs on them that are not managed by nova. If, in the full blown
version of the resource-provider-based scheduler, we are not sending
resource usage updates on compute-node state changes to the
scheduler db and only on failure, retry rate goes up in a
heterogeneous environment. That could well be fine, a price you pay,
but I wonder if it is a concern?

I could get into some noodling here about the artifact world versus
the real world, but that's probably belaboring the point. I'm not
trying to diss or support either approach, just flesh out some of
the gaps in at least my understanding.

> b) Simplicity
> Goes to the above point about debuggability, but I've always tried to follow 
> the mantra that the best software design is not when you've added the last 
> piece to it, but rather when you've removed the last piece from it and still 
> have a functioning and performant system. Having a scheduler that can tackle 
> the process of tracking resources, deciding on placement, and claiming those 
> resources instead of playing an intricate dance of keeping state caches valid 
> will, IMHO, lead to a better scheduler.

I think it is moving in the right direction. Removing the dance of
keeping state caches valid will be a big improvement.

Better still would be removing the duplication and persistence of
information that already exists on the compute nodes. That would be
really cool, but doesn't yet seem possible with the way we do messaging
nor with the way we track shared resources (resource-pools ought to

Chris Dent               (╯°□°)╯︵┻━┻            http://anticdent.org/
freenode: cdent                                         tw: @anticdent

More information about the OpenStack-dev mailing list