[openstack-dev] Bare-metal node scheduling
Mark McLoughlin
markmc at redhat.com
Mon Oct 8 06:49:03 UTC 2012
Hi,
I'm reviewing the first of the "general bare-metal provisioning"
patches:
https://review.openstack.org/13920
and I'm really very concerned at how invasive this is to the core
compute scheduling infrastructure.
Basically, we're adding infrastructure so that a virt driver in a single
compute service can cause the resources of multiple "nodes" to be
advertised to the scheduler.
Making already confusing core infrastructure much more confusing for the
sake of a single virt driver seems like a bad idea.
What we're doing is allowing the scheduler to choose a compute node
based on the details of the individual bare-metal nodes available via
the compute node. However, the compute node is still responsible for
choosing which bare-metal node to provision.
How about we take a more simple approach?
- Assume that each compute node has a homogeneous set of bare-metal
nodes
- Have the bare-metal virt driver advertise the resources of a single
bare-metal node as the resources of the compute node
- Allow the number of un-provisioned bare-metal nodes to be advertised
to the scheduler
- Add a scheduler filter which will ignore compute nodes with no
un-provisioned bare-metal nodes
- Perhaps add a scheduler filter which will ignore compute nodes whose
resources are not an exact match for the instance type
- Add a weighting function based on the number of un-provisioned
bare-metal nodes that will allow choosing between a fill-first and
spread-first strategy
Since the only extra information we need to get to the scheduler is the
number of un-provisioned bare-metal nodes, I think this would be much
more simple.
As for terminology, rather than the scheduler considering "nodes" I
think "slots" would be less confusing.
You could imagine extending this scheme to other virt drivers to give
providers the option of a much more simple and predictable scheduling
strategy. You could configure a compute node to have e.g. 10 medium size
"slots" and the scheduler would only ever schedule 10 medium size
instances to that node. This could potentially be a way for providers to
simplify their capacity planning.
For reference, bare-metal node scheduling was discussed previously on
this thread:
http://lists.openstack.org/pipermail/openstack-dev/2012-August/000627.html
Cheers,
Mark.
More information about the OpenStack-dev
mailing list