Open Stack

Mon Oct 8 06:49:03 UTC 2012

Hi,

I'm reviewing the first of the "general bare-metal provisioning"
patches:

  https://review.openstack.org/13920

and I'm really very concerned at how invasive this is to the core
compute scheduling infrastructure.

Basically, we're adding infrastructure so that a virt driver in a single
compute service can cause the resources of multiple "nodes" to be
advertised to the scheduler.

Making already confusing core infrastructure much more confusing for the
sake of a single virt driver seems like a bad idea.

What we're doing is allowing the scheduler to choose a compute node
based on the details of the individual bare-metal nodes available via
the compute node. However, the compute node is still responsible for
choosing which bare-metal node to provision.

How about we take a more simple approach?

 - Assume that each compute node has a homogeneous set of bare-metal 
   nodes

 - Have the bare-metal virt driver advertise the resources of a single
   bare-metal node as the resources of the compute node

 - Allow the number of un-provisioned bare-metal nodes to be advertised
   to the scheduler

 - Add a scheduler filter which will ignore compute nodes with no 
   un-provisioned bare-metal nodes

 - Perhaps add a scheduler filter which will ignore compute nodes whose 
   resources are not an exact match for the instance type

 - Add a weighting function based on the number of un-provisioned 
   bare-metal nodes that will allow choosing between a fill-first and 
   spread-first strategy

Since the only extra information we need to get to the scheduler is the
number of un-provisioned bare-metal nodes, I think this would be much
more simple.

As for terminology, rather than the scheduler considering "nodes" I
think "slots" would be less confusing.

You could imagine extending this scheme to other virt drivers to give
providers the option of a much more simple and predictable scheduling
strategy. You could configure a compute node to have e.g. 10 medium size
"slots" and the scheduler would only ever schedule 10 medium size
instances to that node. This could potentially be a way for providers to
simplify their capacity planning.

For reference, bare-metal node scheduling was discussed previously on
this thread:

  http://lists.openstack.org/pipermail/openstack-dev/2012-August/000627.html

Cheers,
Mark.

Open Stack

[openstack-dev] Bare-metal node scheduling

OpenStack

Community

Documentation

Branding & Legal